1
|
Lal M, Bhardwaj E, Chahar N, Yadav S, Das S. Comprehensive analysis of 1R- and 2R-MYBs reveals novel genic and protein features, complex organisation, selective expansion and insights into evolutionary tendencies. Funct Integr Genomics 2022; 22:371-405. [PMID: 35260976 DOI: 10.1007/s10142-022-00836-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 02/10/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
Myeloblastosis (MYB) family, the largest plant transcription factor family, has been subcategorised based on the number and type of repeats in the MYB domain. In spite of several reports, evolution of MYB genes and repeats remains enigmatic. Brassicaceae members are endowed with complex genomes, including dysploidy because of its unique history with multiple rounds of polyploidisation, genomic fractionations and rearrangements. The present study is an attempt to gain insights into the complexities of MYB family diversity, understand impacts of genome evolution on gene families and develop an evolutionary framework to understand the origin of various subcategories of MYB gene family. We identified and analysed 1129 MYBs that included 1R-, 2R-, 3R- and atypical-MYBs across sixteen species representing protists, fungi, animals and plants and exclude MYB identified from Brassicaceae except Arabidopsis thaliana; in addition, a total of 1137 2R-MYB genes from six Brassicaceae species were also analysed. Comparative analysis revealed predominance of 1R-MYBs in protists, fungi, animals and lower plants. Phylogenetic reconstruction and analysis of selection pressure suggested ancestral nature of R1-type repeat containing 1R-MYBs that might have undergone intragenic duplication to form multi-repeat MYBs. Distinct differences in gene structure between 1R-MYB and 2R-MYBs were observed regarding intron number, the ratio of gene length to coding DNA sequence (CDS) length and the length of exons encoding the MYB domain. Conserved as well as novel and lineage-specific intron phases were identified. Analyses of physicochemical properties revealed drastic differences indicating functional diversification in MYBs. Phylogenetic reconstruction of 1R- and 2R-MYB genes revealed a shared structure-function relationship in clades which was supported when transcriptome data was analysed in silico. Comparative genomics to study distribution pattern and mapping of 2R-MYBs revealed congruency and greater degree of synteny and collinearity among closely related species. Micro-synteny analysis of genomic segments revealed high conservation of genes that are immediately flanking the surrounding tandemly organised 2R-MYBs along with instances of local duplication, reorganisations and genome fractionation. In summary, polyploidy, dysploidy, reshuffling and genome fractionation were found to cause loss or gain of 2R-MYB genes. The findings need to be supported with functional validation to understand gene structure-function relationship along the evolutionary lineage and adaptive strategies based on comparative functional genomics in plants.
Collapse
Affiliation(s)
- Mukund Lal
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Ekta Bhardwaj
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Nishu Chahar
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Shobha Yadav
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
2
|
Zhang B, Chen S, Liu J, Yan YB, Chen J, Li D, Liu JY. A High-Quality Haplotype-Resolved Genome of Common Bermudagrass ( Cynodon dactylon L.) Provides Insights Into Polyploid Genome Stability and Prostrate Growth. FRONTIERS IN PLANT SCIENCE 2022; 13:890980. [PMID: 35548270 PMCID: PMC9081840 DOI: 10.3389/fpls.2022.890980] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/04/2022] [Indexed: 05/03/2023]
Abstract
Common bermudagrass (Cynodon dactylon L.) is an important perennial warm-season turfgrass species with great economic value. However, the reference genome is still deficient in C. dactylon, which severely impedes basic studies and breeding studies. In this study, a high-quality haplotype-resolved genome of C. dactylon cultivar Yangjiang was successfully assembled using a combination of multiple sequencing strategies. The assembled genome is approximately 1.01 Gb in size and is comprised of 36 pseudo chromosomes belonging to four haplotypes. In total, 76,879 protein-coding genes and 529,092 repeat sequences were annotated in the assembled genome. Evolution analysis indicated that C. dactylon underwent two rounds of whole-genome duplication events, whereas syntenic and transcriptome analysis revealed that global subgenome dominance was absent among the four haplotypes. Genome-wide gene family analyses further indicated that homologous recombination-regulating genes and tiller-angle-regulating genes all showed an adaptive evolution in C. dactylon, providing insights into genome-scale regulation of polyploid genome stability and prostrate growth. These results not only facilitate a better understanding of the complex genome composition and unique plant architectural characteristics of common bermudagrass, but also offer a valuable resource for comparative genome analyses of turfgrasses and other plant species.
Collapse
Affiliation(s)
- Bing Zhang
- School of Life Sciences, Tsinghua University, Beijing, China
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China
| | - Si Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China
| | - Jianxiu Liu
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Yong-Bin Yan
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Jingbo Chen
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Dandan Li
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Jin-Yuan Liu
- School of Life Sciences, Tsinghua University, Beijing, China
- *Correspondence: Jin-Yuan Liu,
| |
Collapse
|
3
|
Pachganov S, Murtazalieva K, Zarubin A, Taran T, Chartier D, Tatarinova TV. Prediction of Rice Transcription Start Sites Using TransPrise: A Novel Machine Learning Approach. Methods Mol Biol 2021; 2238:261-274. [PMID: 33471337 DOI: 10.1007/978-1-0716-1068-8_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As the interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper, we present TransPrise-an efficient deep learning tool for predicting positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well-annotated genome of Oryza sativa. Using a computer with a graphics processing unit, the run time of TransPrise is 250 min on a genome of 374 Mb long.We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all the necessary packages, models, and code as well as the source code of the TransPrise algorithm are available at http://compubioverne.group/ . The source code is ready to use and to be customized to predict TSS in any eukaryotic organism.
Collapse
Affiliation(s)
- Stepan Pachganov
- Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
| | | | - Alexei Zarubin
- Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics, Tomsk, Russia
| | | | - Duane Chartier
- International Center for Art Intelligence, Inc, Los Angeles, CA, USA
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
- Siberian Federal University, Krasnoyarsk, Russia.
| |
Collapse
|
4
|
Pachganov S, Murtazalieva K, Zarubin A, Sokolov D, Chartier DR, Tatarinova TV. TransPrise: a novel machine learning approach for eukaryotic promoter prediction. PeerJ 2019; 7:e7990. [PMID: 31695967 PMCID: PMC6827441 DOI: 10.7717/peerj.7990] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/04/2019] [Indexed: 02/01/2023] Open
Abstract
As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise-an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.
Collapse
Affiliation(s)
- Stepan Pachganov
- Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
| | - Khalimat Murtazalieva
- Vavilov Institute for General Genetics, Moscow, Russia.,Institute of Bioinformatics, Moscow, Russia
| | - Aleksei Zarubin
- Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics, Tomsk, Russia
| | | | - Duane R Chartier
- International Center for Art Intelligence, Inc., Los Angeles, CA, United States of America
| | - Tatiana V Tatarinova
- Vavilov Institute for General Genetics, Moscow, Russia.,Department of Biology, University of La Verne, La Verne, CA, United States of America.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Siberian Federal University, Krasnoyarsk, Russia
| |
Collapse
|
5
|
Characterization of Solanum melongena Thioesterases Related to Tomato Methylketone Synthase 2. Genes (Basel) 2019; 10:genes10070549. [PMID: 31323901 PMCID: PMC6678348 DOI: 10.3390/genes10070549] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 11/16/2022] Open
Abstract
2-Methylketones are involved in plant defense and fragrance and have industrial applications as flavor additives and for biofuel production. We isolated three genes from the crop plant Solanum melongena (eggplant) and investigated these as candidates for methylketone production. The wild tomato methylketone synthase 2 (ShMKS2), which hydrolyzes β-ketoacyl-acyl carrier proteins (ACP) to release β-ketoacids in the penultimate step of methylketone synthesis, was used as a query to identify three homologs from S. melongena: SmMKS2-1, SmMKS2-2, and SmMKS2-3. Expression and functional characterization of SmMKS2s in E. coli showed that SmMKS2-1 and SmMKS2-2 exhibited the thioesterase activity against different β-ketoacyl-ACP substrates to generate the corresponding saturated and unsaturated β-ketoacids, which can undergo decarboxylation to form their respective 2-methylketone products, whereas SmMKS2-3 showed no activity. SmMKS2-1 was expressed at high level in leaves, stems, roots, flowers, and fruits, whereas expression of SmMKS2-2 and SmMKS2-3 was mainly in flowers and fruits, respectively. Expression of SmMKS2-1 was induced in leaves by mechanical wounding, and by methyl jasmonate or methyl salicylate, but SmMKS2-2 and SmMKS2-3 genes were not induced. SmMKS2-1 is a candidate for methylketone-based defense in eggplant, and both SmMKS2-1 and SmMKS2-2 are novel MKS2 enzymes for biosynthesis of methylketones as feedstocks to biofuel production.
Collapse
|
6
|
Liu W, Zhang Z, Zhu W, Ren Z, Jia L, Li W, Ma Z. Evolutionary Conservation and Divergence of Genes Encoding 3-Hydroxy-3-methylglutaryl Coenzyme A Synthase in the Allotetraploid Cotton Species Gossypium hirsutum. Cells 2019; 8:cells8050412. [PMID: 31058869 PMCID: PMC6562921 DOI: 10.3390/cells8050412] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 04/28/2019] [Accepted: 05/01/2019] [Indexed: 11/16/2022] Open
Abstract
Polyploidization is important for the speciation and subsequent evolution of many plant species. Analyses of the duplicated genes produced via polyploidization events may clarify the origin and evolution of gene families. During terpene biosynthesis, 3-hydroxy-3-methylglutaryl coenzyme A synthase (HMGS) functions as a key enzyme in the mevalonate pathway. In this study, we first identified a total of 53 HMGS genes in 23 land plant species, while no HMGS genes were detected in three green algae species. The phylogenetic analysis suggested that plant HMGS genes may have originated from a common ancestral gene before clustering in different branches during the divergence of plant lineages. Then, we detected six HMGS genes in the allotetraploid cotton species (Gossypium hirsutum), which was twice that of the two diploid cotton species (Gossypium raimondii and Gossypium arboreum). The comparison of gene structures and phylogenetic analysis of HMGS genes revealed conserved evolution during polyploidization in Gossypium. Moreover, the expression patterns indicated that six GhHMGS genes were expressed in all tested tissues, with most genes considerably expressed in the roots, and they were responsive to various phytohormone treatments and abiotic stresses. The sequence and expression divergence of duplicated genes in G. hirsutum implied the sub-functionalization of GhHMGS1A and GhHMGS1D as well as GhHMGS3A and GhHMGS3D, whereas it implied the pseudogenization of GhHMGS2A and GhHMGS2D. Collectively, our study unraveled the evolutionary history of HMGS genes in green plants and from diploid to allotetraploid in cotton and illustrated the different evolutionary fates of duplicated HMGS genes resulting from polyploidization.
Collapse
Affiliation(s)
- Wei Liu
- Collaborative Innovation Center of Henan Grain Crops/Agronomy College, Henan Agricultural University, Zhengzhou 450002, China.
| | - Zhiqiang Zhang
- Collaborative Innovation Center of Henan Grain Crops/Agronomy College, Henan Agricultural University, Zhengzhou 450002, China.
| | - Wei Zhu
- Collaborative Innovation Center of Henan Grain Crops/Agronomy College, Henan Agricultural University, Zhengzhou 450002, China.
| | - Zhongying Ren
- State Key Laboratory of Cotton Biology/Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China.
| | - Lin Jia
- Collaborative Innovation Center of Henan Grain Crops/Agronomy College, Henan Agricultural University, Zhengzhou 450002, China.
| | - Wei Li
- State Key Laboratory of Cotton Biology/Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China.
| | - Zongbin Ma
- Collaborative Innovation Center of Henan Grain Crops/Agronomy College, Henan Agricultural University, Zhengzhou 450002, China.
| |
Collapse
|
7
|
Genome-Wide Identification and Comparative Analysis of the 3-Hydroxy-3-methylglutaryl Coenzyme A Reductase (HMGR) Gene Family in Gossypium. Molecules 2018; 23:molecules23020193. [PMID: 29364830 PMCID: PMC6017885 DOI: 10.3390/molecules23020193] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 01/19/2018] [Accepted: 01/21/2018] [Indexed: 11/25/2022] Open
Abstract
Terpenes are the largest and most diverse class of secondary metabolites in plants and play a very important role in plant adaptation to environment. 3-Hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) is a rate-limiting enzyme in the process of terpene biosynthesis in the cytosol. Previous study found the HMGR genes underwent gene expansion in Gossypium raimondii, but the characteristics and evolution of the HMGR gene family in Gossypium genus are unclear. In this study, genome-wide identification and comparative study of HMGR gene family were carried out in three Gossypium species with genome sequences, i.e., G. raimondii, Gossypium arboreum, and Gossypium hirsutum. In total, nine, nine and 18 HMGR genes were identified in G. raimondii, G. arboreum, and G. hirsutum, respectively. The results indicated that the HMGR genes underwent gene expansion and a unique gene cluster containing four HMGR genes was found in all the three Gossypium species. The phylogenetic analysis suggested that the expansion of HMGR genes had occurred in their common ancestor. There was a pseudogene that had a 10-bp deletion resulting in a frameshift mutation and could not be translated into functional proteins in G. arboreum and the A-subgenome of G. hirsutum. The expression profiles of the two pseudogenes showed that they had tissue-specific expression. Additionally, the expression pattern of the pseudogene in the A-subgenome of G. hirsutum was similar to its paralogous gene in the D-subgenome of G. hirsutum. Our results provide useful information for understanding cytosolic terpene biosynthesis in Gossypium species.
Collapse
|
8
|
Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017; 12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open
Abstract
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
Collapse
Affiliation(s)
- Martin Triska
- Children’s Hospital Los Angeles, University of Southern California, Los Angeles, CA, United States of America
- Faculty of Advanced Technology, University of South Wales, Pontypridd, Wales, United Kingdom
| | | | - Ancha Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Research Centre for Medical Genetics, Moscow, Russia
| | - Alexander Kel
- geneXplain GmbH, Wolfenbuettel, Germany
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
| | - Tatiana V. Tatarinova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Department of Biology, Division of Natural Sciences, University of La Verne, La Verne, CA, United States of America
- Bioinformatics Center, AA Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
- Vavilov’s Institute for General Genetics, Moscow, Russia, Moscow, Russia
- * E-mail:
| |
Collapse
|
9
|
Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017; 12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Background Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Results Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. Conclusions We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops. Reviewers This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Nadzirah Amiruddin
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norazah Azizi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Mohd Amin Ab Halim
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nik Shazana Nik Mohd Sanusi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nagappan Jayanthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Petr Ponomarenko
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Martin Triska
- Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
| | - Victor Solovyev
- Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Ravigadevi Sambanthamurthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Denis Murphy
- Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.
| |
Collapse
|
10
|
Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You FM. Pseudogenes and Their Genome-Wide Prediction in Plants. Int J Mol Sci 2016; 17:E1991. [PMID: 27916797 PMCID: PMC5187791 DOI: 10.3390/ijms17121991] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 11/17/2022] Open
Abstract
Pseudogenes are paralogs generated from ancestral functional genes (parents) during genome evolution, which contain critical defects in their sequences, such as lacking a promoter, having a premature stop codon or frameshift mutations. Generally, pseudogenes are functionless, but recent evidence demonstrates that some of them have potential roles in regulation. The majority of pseudogenes are generated from functional progenitor genes either by gene duplication (duplicated pseudogenes) or retro-transposition (processed pseudogenes). Pseudogenes are primarily identified by comparison to their parent genes. Bioinformatics tools for pseudogene prediction have been developed, among which PseudoPipe, PSF and Shiu's pipeline are publicly available. We compared these three tools using the well-annotated Arabidopsis thaliana genome and its known 924 pseudogenes as a test data set. PseudoPipe and Shiu's pipeline identified ~80% of A. thaliana pseudogenes, of which 94% were shared, while PSF failed to generate adequate results. A need for improvement of the bioinformatics tools for pseudogene prediction accuracy in plant genomes was thus identified, with the ultimate goal of improving the quality of genome annotation in plants.
Collapse
Affiliation(s)
- Jin Xiao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Manoj Kumar Sekhwal
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Soil Science, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada.
| | - Pingchuan Li
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Raja Ragupathy
- Department of Plant Science, University of Saskatchewan, Saskatoon, SK S7N 5A2, Canada.
| | - Sylvie Cloutier
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Xiue Wang
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Frank M You
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| |
Collapse
|
11
|
Kuderová A, Gallová L, Kuricová K, Nejedlá E, Čurdová A, Micenková L, Plíhal O, Šmajs D, Spíchal L, Hejátko J. Identification of AHK2- and AHK3-like cytokinin receptors in Brassica napus reveals two subfamilies of AHK2 orthologues. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:339-53. [PMID: 25336686 DOI: 10.1093/jxb/eru422] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Cytokinin (CK) signalling is known to play key roles in the regulation of plant growth and development, crop yields, and tolerance to both abiotic stress and pathogen defences, but the mechanisms involved are poorly characterized in dicotyledonous crops. Here the identification and functional characterization of sensor histidine kinases homologous to Arabidopsis CK receptors AHK2 and AHK3 in winter oilseed rape are presented. Five CHASE-containing His kinases were identified in Brassica napus var. Tapidor (BnCHK1-BnCHK5) by heterologous hybridization of its genomic library with gene-specific probes from Arabidopsis. The identified bacterial artificial chromosome (BAC) clones were fingerprinted and representative clones in five distinct groups were sequenced. Using a bioinformatic approach and cDNA cloning, the precise gene and putative protein domain structures were determined. Based on phylogenetic analysis, four AHK2 (BnCHK1-BnCHK4) homologues and one AHK3 (BnCHK5) homologue were defined. It is further suggested that BnCHK1 and BnCHK3, and BnCHK5 are orthologues of AHK2 and AHK3, originally from the B. rapa A genome, respectively. BnCHK1, BnCHK3, and BnCHK5 displayed high affinity for trans-zeatin (1-3nM) in a live-cell competitive receptor assay, but not with other plant hormones (indole acetic acid, GA3, and abscisic acid), confirming the prediction that they are genuine CK receptors. It is shown that BnCHK1 and BnCHK3, and BnCHK5 display distinct preferences for various CK bases and metabolites, characteristic of their AHK counterparts, AHK2 and AHK3, respectively. Interestingly, the AHK2 homologues could be divided into two subfamilies (BnCHK1/BnCK2 and BnCHK3/BnCHK4) that differ in putative transmembrane domain topology and CK binding specificity, thus implying potential functional divergence.
Collapse
Affiliation(s)
- Alena Kuderová
- Functional Genomics and Proteomics of Plants, Central European Institute of Technology (CEITEC), Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Lucia Gallová
- Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 11, 783 71 Olomouc, Czech Republic
| | - Katarína Kuricová
- Functional Genomics and Proteomics of Plants, Central European Institute of Technology (CEITEC), Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Eliška Nejedlá
- Functional Genomics and Proteomics of Plants, Central European Institute of Technology (CEITEC), Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Anna Čurdová
- Functional Genomics and Proteomics of Plants, Central European Institute of Technology (CEITEC), Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Lenka Micenková
- Faculty of Medicine, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Ondřej Plíhal
- Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 11, 783 71 Olomouc, Czech Republic
| | - David Šmajs
- Faculty of Medicine, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Lukáš Spíchal
- Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 11, 783 71 Olomouc, Czech Republic
| | - Jan Hejátko
- Functional Genomics and Proteomics of Plants, Central European Institute of Technology (CEITEC), Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| |
Collapse
|
12
|
Nasiri J, Naghavi M, Rad SN, Yolmeh T, Shirazi M, Naderi R, Nasiri M, Ahmadi S. Gene identification programs in bread wheat: a comparison study. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2014; 32:529-54. [PMID: 24124688 DOI: 10.1080/15257770.2013.832773] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Seven ab initio web-based gene prediction programs (i.e., AUGUSTUS, BGF, Fgenesh, Fgenesh+, GeneID, Genemark.hmm, and HMMgene) were assessed to compare their prediction accuracy using protein-coding sequences of bread wheat. At both nucleotide and exon levels, Fgenesh+ was deduced as the superior program and BGF followed by Fgenesh were resided in the next positions, respectively. Conversely, at gene level, Fgenesh with the value of predicting more than 75% of all the genes precisely, concluded as the best ones. It was also found out that programs such as Fgenesh+, BGF, and Fgenesh, because of harboring the highest percentage of correct predictive exons appear to be much more applicable in achieving more trustworthy results, while using both GeneID and HMMgene the percentage of false negatives would be expected to enhance. Regarding initial exon, overall, the frequency of accurate recognition of 3' boundary was significantly higher than that of 5' and the reverse was true if terminal exon is taken into account. Lastly, HMMgene and Genemark.hmm, overall, presented independent tendency against GC content, while the others appear to be slightly more sensitive if GC-poor sequences are employed. Our results, overall, exhibited that to make adequate opportunity in acquiring remarkable results, gene finders still need additional improvements.
Collapse
Affiliation(s)
- Jaber Nasiri
- a Department of Agronomy and Plant Breeding, Division of Molecular Plant Genetics, College of Agricultural & Natural Resources , University of Tehran , Karaj , Tehran , Iran
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Rossato DO, Ludwig A, Deprá M, Loreto ELS, Ruiz A, Valente VLS. BuT2 is a member of the third major group of hAT transposons and is involved in horizontal transfer events in the genus Drosophila. Genome Biol Evol 2014; 6:352-65. [PMID: 24459285 PMCID: PMC3942097 DOI: 10.1093/gbe/evu017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2014] [Indexed: 12/24/2022] Open
Abstract
The hAT superfamily comprises a large and diverse array of DNA transposons found in all supergroups of eukaryotes. Here we characterized the Drosophila buzzatii BuT2 element and found that it harbors a five-exon gene encoding a 643-aa putatively functional transposase. A phylogeny built with 85 hAT transposases yielded, in addition to the two major groups already described, Ac and Buster, a third one comprising 20 sequences that includes BuT2, Tip100, hAT-4_BM, and RP-hAT1. This third group is here named Tip. In addition, we studied the phylogenetic distribution and evolution of BuT2 by in silico searches and molecular approaches. Our data revealed BuT2 was, most often, vertically transmitted during the evolution of genus Drosophila being lost independently in several species. Nevertheless, we propose the occurrence of three horizontal transfer events to explain its distribution and conservation among species. Another aspect of BuT2 evolution and life cycle is the presence of short related sequences, which contain similar 5' and 3' regions, including the terminal inverted repeats. These sequences that can be considered as miniature inverted repeat transposable elements probably originated by internal deletion of complete copies and show evidences of recent mobilization.
Collapse
Affiliation(s)
- Dirleane Ottonelli Rossato
- Programa de Pós-Graduação em
Ecologia, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do
Sul, Brazil
| | - Adriana Ludwig
- Laboratório de Genômica Funcional, Instituto
Carlos Chagas (ICC), Fiocruz-PR, Curitiba, Paraná, Brazil
| | - Maríndia Deprá
- Programa de Pós-Graduação em Biologia
Animal, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do
Sul, Brazil
- Departamento de Genética, Universidade Federal do
Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
| | - Elgion L. S. Loreto
- Programa de Pós-Graduação em
Genética e Biologia Molecular Universidade Federal do Rio Grande do Sul (UFRGS),
Porto Alegre, Rio Grande do Sul, Brazil
- Departamento de Biologia, Universidade Federal de Santa
Maria (UFSM), Santa Maria, Rio Grande do Sul, Brazil
| | - Alfredo Ruiz
- Departament de Genètica i Microbiologia, Facultat
de Biociènces, Universitat Autònoma de Barcelona, Spain
| | - Vera L. S. Valente
- Programa de Pós-Graduação em Biologia
Animal, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do
Sul, Brazil
- Departamento de Genética, Universidade Federal do
Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
- Programa de Pós-Graduação em
Genética e Biologia Molecular Universidade Federal do Rio Grande do Sul (UFRGS),
Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
14
|
Numa H, Itoh T. MEGANTE: a web-based system for integrated plant genome annotation. PLANT & CELL PHYSIOLOGY 2014; 55:e2. [PMID: 24253915 PMCID: PMC3894707 DOI: 10.1093/pcp/pct157] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon-intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/.
Collapse
Affiliation(s)
| | - Takeshi Itoh
- *Corresponding author: E-mail, ; Fax, +81-29-838-7065
| |
Collapse
|
15
|
Zhou P, Silverstein KAT, Gao L, Walton JD, Nallu S, Guhlin J, Young ND. Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application). BMC Bioinformatics 2013; 14:335. [PMID: 24256031 PMCID: PMC3924332 DOI: 10.1186/1471-2105-14-335] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 11/15/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multiple aspects of plant development, reproduction and defense responses. However, popular similarity search tools and gene prediction techniques generally fail to identify most members belonging to this class of genes. This is largely due to the high sequence divergence among family members and the limited availability of experimentally verified small peptides to use as training sets for homology search and ab initio prediction. Consequently, there is an urgent need for both experimental and computational studies in order to further advance the accurate prediction of small peptides. RESULTS We present here a homology-based gene prediction program to accurately predict small peptides at the genome level. Given a high-quality profile alignment, SPADA identifies and annotates nearly all family members in tested genomes with better performance than all general-purpose gene prediction programs surveyed. We find numerous mis-annotations in the current Arabidopsis thaliana and Medicago truncatula genome databases using SPADA, most of which have RNA-Seq expression support. We also show that SPADA works well on other classes of small secreted peptides in plants (e.g., self-incompatibility protein homologues) as well as non-secreted peptides outside the plant kingdom (e.g., the alpha-amanitin toxin gene family in the mushroom, Amanita bisporigera). CONCLUSIONS SPADA is a free software tool that accurately identifies and predicts the gene structure for short peptides with one or two exons. SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models. SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families. A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.
Collapse
Affiliation(s)
- Peng Zhou
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Kevin AT Silverstein
- Supercomputing Institute for Advanced Computational Research, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Liangliang Gao
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Jonathan D Walton
- Department of Plant Biology and U.S. Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, Michigan 48824, USA
| | - Sumitha Nallu
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Joseph Guhlin
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Nevin D Young
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108, USA
| |
Collapse
|
16
|
Zhu Q, Bennetzen JL, Smith SM. Isolation and diversity analysis of resistance gene homologues from switchgrass. G3 (BETHESDA, MD.) 2013; 3:1031-42. [PMID: 23589518 PMCID: PMC3689800 DOI: 10.1534/g3.112.005447] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 04/10/2013] [Indexed: 12/31/2022]
Abstract
Resistance gene homologs (RGHs) were isolated from the switchgrass variety Alamo by a combination of polymerase chain reaction and expressed sequence tag (EST) database mining. Fifty-eight RGHs were isolated by polymerase chain reaction and 295 RGHs were identified in 424,545 switchgrass ESTs. Four nucleotide binding site--leucine-rich repeat RGHs were selected to investigate RGH haplotypic diversity in seven switchgrass varieties chosen for their representation of a broad range of the switchgrass germplasm. Lowland and upland ecotypes were found to be less similar, even from nearby populations, than were more distant populations with similar growth environments. Most (83.5%) of the variability in these four RGHs was found to be attributable to the within-population component. The difference in nucleotide diversity between and within populations was observed to be small, whereas this diversity is maintained to similar degrees at both population and ecotype levels. The results also revealed that the analyzed RGHs were under positive selection in the studied switchgrass accessions. Intragenic recombination was detected in switchgrass RGHs, thereby demonstrating an active genetic process that has the potential to generate new resistance genes with new specificities that might act against newly-arising pathogen races.
Collapse
Affiliation(s)
- Qihui Zhu
- Department of Genetics, The University of Georgia, Athens, Georgia 30602
| | | | - Shavannor M. Smith
- Department of Plant Pathology, The University of Georgia, Athens, Georgia 30602
| |
Collapse
|
17
|
|
18
|
Abstract
Background Miniature inverted-repeat transposable elements (MITEs) are short, nonautonomous DNA elements flanked by subterminal or terminal inverted repeats (TIRs) with no coding capacity. MITEs were originally recognized as important components of plant genomes, where they can attain extremely high copy numbers, and are also found in several animal genomes, including mosquitoes, fish and humans. So far, few MITEs have been described in Drosophila. Results Herein we describe the distribution and evolution of Mar, a MITE family of hAT transposons, in Drosophilidae species. In silico searches and PCR screening showed that Mar distribution is restricted to the willistoni subgroup of the Drosophila species, and a phylogenetic analysis of Mar indicates that this element may have originated prior to the diversification of these species. Most of the Mar copies in D. willistoni present conserved target site duplications and TIRs, indicating recent mobilization of these sequences. We also identified relic copies of potentially full-length Mar transposon in D. tropicalis and D. willistoni. The phylogenetic relationship among transposases from the putative full-length Mar and other hAT superfamily elements revealed that Mar is placed into the recently determined Buster group of hAT transposons. Conclusion On the basis of the obtained data, we can suggest that the origin of these Mar MITEs occurred before the subgroup willistoni speciation, which started about 5.7 Mya. The Mar relic transposase existence indicates that these MITEs originated by internal deletions and suggests that the full-length transposon was recently functional in D. willistoni, promoting Mar MITEs mobilization.
Collapse
|
19
|
Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR. Plant genome annotation methods. Methods Mol Biol 2009; 513:263-82. [PMID: 19347655 DOI: 10.1007/978-1-59745-427-8_14] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Annotation of plant genomic sequences can be separated into structural and functional annotation. Structural annotation is the foundation of all genomics as without accurate gene models understanding gene function or evolution of genes across taxa can be impeded. Structural annotation is dependent on sensitive, specific computational programs and deep experimental evidence to identify gene features within genomic DNA. Functional annotation is highly dependent on sequence similarity to other known genes or proteins as the majority of initial "first-pass" functional annotation on a genomic scale is transitive. Coupling structural and functional annotation across genomes in a comparative manner promotes more accurate annotation as well as an understanding of gene and genome evolution. With the increasing availability of plant genome sequence data, the value of comparative annotation will increase. As with any new field, methodologies are evolving for genome annotation and will improve in the future.
Collapse
Affiliation(s)
- Shu Ouyang
- The Institute for Genomic Research, Rockville, MD, USA
| | | | | | | | | |
Collapse
|
20
|
van Erp H, Walton JD. Regulation of the cellulose synthase-like gene family by light in the maize mesocotyl. PLANTA 2009; 229:885-897. [PMID: 19130077 DOI: 10.1007/s00425-008-0881-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2008] [Accepted: 12/18/2008] [Indexed: 05/27/2023]
Abstract
The cellulose synthase-like (ZmCSL) gene family of maize was annotated and its expression studied in the maize mesocotyl. A total of 28 full-length CSL genes and another 13 partial sequences were annotated; four are predicted to be pseudogenes. Maize has all of the CSL subfamilies that are present in rice, but the CSLC subfamily is expanded from 6 in rice to 12 in maize, and the CSLH subfamily might be reduced from 3 to 1. Unlike rice, maize has a gene in the CSLG subfamily, based on its sequence similarity to two genes annotated as CSLG in poplar. Light regulation of glycan synthase enzyme activities and CSL gene expression were analyzed in the mesocotyl. A Golgi-localized glucan synthase activity is reduced by ~50% 12 h after exposure to light. beta-1,4-Mannan synthase activity is reduced even more strongly (>85%), whereas beta-1,4-xylan synthase, callose synthase, and latent IDPase activity respond only slightly, if at all, to light. At least 17 of the CSL genes (42%) are expressed in the mesocotyl, of which four are up-regulated at least twofold, seven are down-regulated at least twofold, and six are not affected by light. The results contribute to our understanding of the structure of the CSL gene family in an important food and biofuel plant, show that a large percentage of the CSL genes are expressed in the specialized tissues of the mesocotyl, and demonstrate that members of the CSL gene family are differentially subject to photobiological regulation.
Collapse
Affiliation(s)
- Harrie van Erp
- Department of Energy-Plant Research Laboratory, Michigan State University, E. Lansing, MI 48824, USA
| | | |
Collapse
|
21
|
Madeira L, Galante PAF, Budu A, Azevedo MF, Malnic B, Garcia CRS. Genome-wide detection of serpentine receptor-like proteins in malaria parasites. PLoS One 2008; 3:e1889. [PMID: 18365025 PMCID: PMC2268965 DOI: 10.1371/journal.pone.0001889] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 02/21/2008] [Indexed: 11/19/2022] Open
Abstract
Serpentine receptors comprise a large family of membrane receptors distributed over diverse organisms, such as bacteria, fungi, plants and all metazoans. However, the presence of serpentine receptors in protozoan parasites is largely unknown so far. In the present study we performed a genome-wide search for proteins containing seven transmembrane domains (7-TM) in the human malaria parasite Plasmodium falciparum and identified four serpentine receptor-like proteins. These proteins, denoted PfSR1, PfSR10, PfSR12 and PfSR25, show membrane topologies that resemble those exhibited by members belonging to different families of serpentine receptors. Expression of the pfsrs genes was detected by Real Time PCR in P. falciparum intraerythrocytic stages, indicating that they potentially code for functional proteins. We also found corresponding homologues for the PfSRs in five other Plasmodium species, two primate and three rodent parasites. PfSR10 and 25 are the most conserved receptors among the different species, while PfSR1 and 12 are more divergent. Interestingly, we found that PfSR10 and PfSR12 possess similarity to orphan serpentine receptors of other organisms. The identification of potential parasite membrane receptors raises a new perspective for essential aspects of malaria parasite host cell infection.
Collapse
Affiliation(s)
- Luciana Madeira
- Departamento de Parasitologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brasil
| | - Pedro A. F. Galante
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
- Ludwig Institute for Cancer Research, São Paulo, Brasil
| | - Alexandre Budu
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Mauro F. Azevedo
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Bettina Malnic
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Célia R. S. Garcia
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
22
|
D'Agostino N, Traini A, Frusciante L, Chiusano ML. Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome. BMC Bioinformatics 2007; 8 Suppl 1:S9. [PMID: 17430576 PMCID: PMC1885861 DOI: 10.1186/1471-2105-8-s1-s9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. RESULTS GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences.GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. CONCLUSION The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation.
Collapse
Affiliation(s)
- Nunzio D'Agostino
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| | - Alessandra Traini
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| | - Luigi Frusciante
- Department of Soil, Plant, and Environmental Sciences, University 'Federico II', 80055 Portici, Naples, Italy
| | - Maria Luisa Chiusano
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| |
Collapse
|
23
|
Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA, Aono H, Apweiler R, Bruskiewich R, Bureau T, Burr F, Costa de Oliveira A, Fuks G, Habara T, Haberer G, Han B, Harada E, Hiraki AT, Hirochika H, Hoen D, Hokari H, Hosokawa S, Hsing Y, Ikawa H, Ikeo K, Imanishi T, Ito Y, Jaiswal P, Kanno M, Kawahara Y, Kawamura T, Kawashima H, Khurana JP, Kikuchi S, Komatsu S, Koyanagi KO, Kubooka H, Lieberherr D, Lin YC, Lonsdale D, Matsumoto T, Matsuya A, McCombie WR, Messing J, Miyao A, Mulder N, Nagamura Y, Nam J, Namiki N, Numa H, Nurimoto S, O’Donovan C, Ohyanagi H, Okido T, OOta S, Osato N, Palmer LE, Quetier F, Raghuvanshi S, Saichi N, Sakai H, Sakai Y, Sakata K, Sakurai T, Sato F, Sato Y, Schoof H, Seki M, Shibata M, Shimizu Y, Shinozaki K, Shinso Y, Singh NK, Smith-White B, Takeda JI, Tanino M, Tatusova T, Thongjuea S, Todokoro F, Tsugane M, Tyagi AK, Vanavichit A, Wang A, Wing RA, Yamaguchi K, Yamamoto M, Yamamoto N, Yu Y, Zhang H, Zhao Q, Higo K, Burr B, Gojobori T, Sasaki T, for the Rice Annotation Project. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genes Dev 2007; 17:175-83. [PMID: 17210932 PMCID: PMC1781349 DOI: 10.1101/gr.5509507] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2006] [Accepted: 10/31/2006] [Indexed: 11/25/2022]
Abstract
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is approximately 32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.
Collapse
Affiliation(s)
- Takeshi Itoh
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
| | - Tsuyoshi Tanaka
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Roberto A. Barrero
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Chisato Yamasaki
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Yasuyuki Fujii
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Phillip B. Hilton
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Baltazar A. Antonio
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Hideo Aono
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Rolf Apweiler
- EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Richard Bruskiewich
- Biometrics and Bioinformatics Unit, International Rice Research Institute, DAPO Box 7777, Metro Manila, Philippines
| | - Thomas Bureau
- Department of Biology, McGill University, Montreal, Quebec H3A 1B1, Canada
| | - Frances Burr
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA
| | | | - Galina Fuks
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey 08854, USA
| | - Takuya Habara
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Georg Haberer
- Institute for Bioinformatics, GSF National Research Center for Environment and Health, D-85764 Neuherberg, Germany
| | - Bin Han
- Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China
| | - Erimi Harada
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Aiko T. Hiraki
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Hirohiko Hirochika
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Douglas Hoen
- Department of Biology, McGill University, Montreal, Quebec H3A 1B1, Canada
| | - Hiroki Hokari
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Satomi Hosokawa
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Yue Hsing
- Institute of Botany, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Hiroshi Ikawa
- Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan
| | - Kazuho Ikeo
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Tadashi Imanishi
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan
| | - Yukiyo Ito
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Pankaj Jaiswal
- Department of Plant Breeding, Cornell University, Ithaca, New York 14853, USA
| | - Masako Kanno
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Yoshihiro Kawahara
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji-shi, Tokyo 192-0397, Japan
| | - Toshiyuki Kawamura
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Hiroaki Kawashima
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Jitendra P. Khurana
- Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi 110021, India
| | - Shoshi Kikuchi
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Setsuko Komatsu
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
- National Institute of Crop Science, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-8518, Japan
| | - Kanako O. Koyanagi
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan
| | - Hiromi Kubooka
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Damien Lieberherr
- SWISS-PROT Group, Swiss Institute of Bioinformatics, CH-1211 Geneva 4, Switzerland
| | - Yao-Cheng Lin
- Institute of Botany, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - David Lonsdale
- EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Takashi Matsumoto
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Akihiro Matsuya
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | | | - Joachim Messing
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey 08854, USA
| | - Akio Miyao
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Nicola Mulder
- EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Yoshiaki Nagamura
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Jongmin Nam
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
- Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Nobukazu Namiki
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Hisataka Numa
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | - Shin Nurimoto
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Claire O’Donovan
- EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Hajime Ohyanagi
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
- Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan
| | - Toshihisa Okido
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Satoshi OOta
- RIKEN BioResource Center, RIKEN Tsukuba Institute, Tsukuba, Ibaraki 305-0074, Japan
| | - Naoki Osato
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Lance E. Palmer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11723, USA
- Department of Molecular Genetics and Microbiology, and Center for Infectious Diseases, The State University of New York at Stony Brook, Stony Brook, New York 11794, USA
| | | | - Saurabh Raghuvanshi
- Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi 110021, India
| | - Naomi Saichi
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Hiroaki Sakai
- Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Yasumichi Sakai
- Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan
| | - Katsumi Sakata
- Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan
| | - Tetsuya Sakurai
- Metabolomics Research Group, RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan
| | - Fumihiko Sato
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Yoshiharu Sato
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Heiko Schoof
- Institute for Bioinformatics, GSF National Research Center for Environment and Health, D-85764 Neuherberg, Germany
- Technische Universität München, Genome Oriented Bioinformatics, D-85354 Freising-Weihenstephan, Germany
- Plant Computational Biology, Max-Planck-Institute for Plant Breeding Research, D 50829 Cologne, Germany
| | - Motoaki Seki
- Plant Functional Genomics Research Group, RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan
| | - Michie Shibata
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Yuji Shimizu
- Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Ibaraki 305-0032, Japan
| | - Kazuo Shinozaki
- RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan
| | - Yuji Shinso
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Nagendra K. Singh
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi 110012, India
| | - Brian Smith-White
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Jun-ichi Takeda
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Motohiko Tanino
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Tatiana Tatusova
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Supat Thongjuea
- Rice Gene Discovery Unit, Kasetsart University, Nakorn Pathom 73140, Thailand
| | - Fusano Todokoro
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Mika Tsugane
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Akhilesh K. Tyagi
- Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi 110021, India
| | - Apichart Vanavichit
- Rice Gene Discovery Unit, Kasetsart University, Nakorn Pathom 73140, Thailand
| | - Aihui Wang
- The Institute for Genomic Research, Rockville, Maryland 20850, USA
| | - Rod A. Wing
- Arizona Genomics Institute, The University of Arizona, Tucson, Arizona 85721, USA
| | - Kaori Yamaguchi
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Mayu Yamamoto
- Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki 305-0854, Japan
| | - Naoyuki Yamamoto
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Yeisoo Yu
- Arizona Genomics Institute, The University of Arizona, Tucson, Arizona 85721, USA
| | - Hao Zhang
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, Koto-ku, Tokyo 135-0064, Japan
| | - Qiang Zhao
- Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China
| | - Kenichi Higo
- National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
- Bio-Oriented Technology Research Advancement Institution, Minato-ku, Tokyo 105-0001, Japan
| | - Benjamin Burr
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Takashi Gojobori
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
| | - Takuji Sasaki
- National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan
| | | |
Collapse
|
24
|
Fu Y, Wen TJ, Ronin YI, Chen HD, Guo L, Mester DI, Yang Y, Lee M, Korol AB, Ashlock DA, Schnable PS. Genetic dissection of intermated recombinant inbred lines using a new genetic map of maize. Genetics 2006; 174:1671-83. [PMID: 16951074 PMCID: PMC1667089 DOI: 10.1534/genetics.106.060376] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A new genetic map of maize, ISU-IBM Map4, that integrates 2029 existing markers with 1329 new indel polymorphism (IDP) markers has been developed using intermated recombinant inbred lines (IRILs) from the intermated B73xMo17 (IBM) population. The website http://magi.plantgenomics.iastate.edu provides access to IDP primer sequences, sequences from which IDP primers were designed, optimized marker-specific PCR conditions, and polymorphism data for all IDP markers. This new gene-based genetic map will facilitate a wide variety of genetic and genomic research projects, including map-based genome sequencing and gene cloning. The mosaic structures of the genomes of 91 IRILs, an important resource for identifying and mapping QTL and eQTL, were defined. Analyses of segregation data associated with markers genotyped in three B73/Mo17-derived mapping populations (F2, Syn5, and IBM) demonstrate that allele frequencies were significantly altered during the development of the IBM IRILs. The observations that two segregation distortion regions overlap with maize flowering-time QTL suggest that the altered allele frequencies were a consequence of inadvertent selection. Detection of two-locus gamete disequilibrium provides another means to extract functional genomic data from well-characterized plant RILs.
Collapse
Affiliation(s)
- Yan Fu
- Interdepartmental Genetics Graduate Program, Iowa State University, Ames, Iowa 50011-3467, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Yandeau-Nelson MD, Xia Y, Li J, Neuffer MG, Schnable PS. Unequal sister chromatid and homolog recombination at a tandem duplication of the A1 locus in maize. Genetics 2006; 173:2211-26. [PMID: 16751673 PMCID: PMC1569709 DOI: 10.1534/genetics.105.052712] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Tandemly arrayed duplicate genes are prevalent. The maize A1-b haplotype is a tandem duplication that consists of the components, alpha and beta. The rate of meiotic unequal recombination at A1-b is ninefold higher when a homolog is present than when it is absent (i.e., hemizygote). When a sequence heterologous homolog is available, 94% of recombinants (264/281) are generated via recombination with the homolog rather than with the sister chromatid. In addition, 83% (220/264) of homolog recombination events involved alpha rather than beta. These results indicate that: (1) the homolog is the preferred template for unequal recombination and (2) pairing of the duplicated segments with the homolog does not occur randomly but instead favors a particular configuration. The choice of recombination template (i.e., homolog vs. sister chromatid) affects the distribution of recombination breakpoints within a1. Rates of unequal recombination at A1-b are similar to the rate of recombination between nonduplicated a1 alleles. Unequal recombination is therefore common and is likely to be responsible for the generation of genetic variability, even within inbred lines.
Collapse
Affiliation(s)
- Marna D Yandeau-Nelson
- Interdepartmental Genetics Program, Genetics, Development and Cell Biology Department, Center for Plant Genomics, Iowa State University, Ames 50011, USA
| | | | | | | | | |
Collapse
|
26
|
Windsor AJ, Mitchell-Olds T. Comparative genomics as a tool for gene discovery. Curr Opin Biotechnol 2006; 17:161-7. [PMID: 16459073 DOI: 10.1016/j.copbio.2006.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2005] [Revised: 12/20/2005] [Accepted: 01/20/2006] [Indexed: 01/21/2023]
Abstract
With the increasing availability of data from multiple eukaryotic genome sequencing projects, attention has focused on interspecific comparisons to discover novel genes and transcribed genomic sequences. Generally, these extrinsic strategies combine ab initio gene prediction with expression and/or homology data to identify conserved gene candidates between two or more genomes. Interspecific sequence analyses have proven invaluable for the improvement of existing annotations, automation of annotation, and identification of novel coding regions and splice variants. Further, comparative genomic approaches hold the promise of improved prediction of terminal or small exons, microRNA precursors, and small peptide-encoding open reading frames--sequence elements that are difficult to identify through purely intrinsic methodologies in the absence of experimental data.
Collapse
Affiliation(s)
- Aaron J Windsor
- Max-Planck-Institut fuer chemische Oekologie, Abteilung Genetik und Evolution, Hans-Knoell-Strasse 8, D-07745 Jena, Germany.
| | | |
Collapse
|
27
|
Rabinowicz PD, Bennetzen JL. The maize genome as a model for efficient sequence analysis of large plant genomes. CURRENT OPINION IN PLANT BIOLOGY 2006; 9:149-56. [PMID: 16459129 DOI: 10.1016/j.pbi.2006.01.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2006] [Accepted: 01/20/2006] [Indexed: 05/06/2023]
Abstract
The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of repetitive elements in intergenic regions. High-quality sequences of the relatively small genomes of Arabidopsis (0.14 Gbp) and rice (0.4 Gbp) have now been largely completed. The sequencing of plant genomes that have a more representative size (the mean for flowering plant genomes is 5.6 Gbp) has been seen as a daunting task, partly because of their size and partly because of the numerous highly conserved repeats. Nevertheless, creative strategies and powerful new tools have been generated recently in the plant genetics community, so that sequencing large plant genomes is now a realistic possibility. Maize (2.4-2.7 Gbp) will be the first gigabase-size plant genome to be sequenced using these novel approaches. Pilot studies on maize indicate that the new gene-enrichment, gene-finishing and gene-orientation technologies are efficient, robust and comprehensive. These strategies will succeed in sequencing the gene-space of large genome plants, and in locating all of these genes and adjacent sequences on the genetic and physical maps.
Collapse
Affiliation(s)
- Pablo D Rabinowicz
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
| | | |
Collapse
|
28
|
Fu Y, Emrich SJ, Guo L, Wen TJ, Ashlock DA, Aluru S, Schnable PS. Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc Natl Acad Sci U S A 2005; 102:12282-7. [PMID: 16103354 PMCID: PMC1186025 DOI: 10.1073/pnas.0503394102] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2005] [Indexed: 11/18/2022] Open
Abstract
Recent sequencing efforts have targeted the gene-rich regions of the maize (Zea mays L.) genome. We report the release of an improved assembly of maize assembled genomic islands (MAGIs). The 114,173 resulting contigs have been subjected to computational and physical quality assessments. Comparisons to the sequences of maize bacterial artificial chromosomes suggest that at least 97% (160 of 165) of MAGIs are correctly assembled. Because the rates at which junction-testing PCR primers for genomic survey sequences (90-92%) amplify genomic DNA are not significantly different from those of control primers ( approximately 91%), we conclude that a very high percentage of genic MAGIs accurately reflect the structure of the maize genome. EST alignments, ab initio gene prediction, and sequence similarity searches of the MAGIs are available at the Iowa State University MAGI web site. This assembly contains 46,688 ab initio predicted genes. The expression of almost half (628 of 1,369) of a sample of the predicted genes that lack expression evidence was validated by RT-PCR. Our analyses suggest that the maize genome contains between approximately 33,000 and approximately 54,000 expressed genes. Approximately 5% (32 of 628) of the maize transcripts discovered do not have detectable paralogs among maize ESTs or detectable homologs from other species in the GenBank NR nucleotide/protein database. Analyses therefore suggest that this assembly of the maize genome contains approximately 350 previously uncharacterized expressed genes. We hypothesize that these "orphans" evolved quickly during maize evolution and/or domestication.
Collapse
Affiliation(s)
- Yan Fu
- Interdepartmental Genetics Graduate Program, L. H. Baker Center for Bioinformatics and Biological Statistics, Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | | | | | |
Collapse
|