1
|
Carbonnel S, Cornelis S, Hazak O. The CLE33 peptide represses phloem differentiation via autocrine and paracrine signaling in Arabidopsis. Commun Biol 2023; 6:588. [PMID: 37280369 DOI: 10.1038/s42003-023-04972-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/23/2023] [Indexed: 06/08/2023] Open
Abstract
Plant meristems require a constant supply of photoassimilates and hormones to the dividing meristematic cells. In the growing root, such supply is delivered by protophloem sieve elements. Due to its preeminent function for the root apical meristem, protophloem is the first tissue to differentiate. This process is regulated by a genetic circuit involving in one side the positive regulators DOF transcription factors, OCTOPUS (OPS) and BREVIX RADIX (BRX), and in the other side the negative regulators CLAVATA3/EMBRYO SURROUNDING REGION RELATED (CLE) peptides and their cognate receptors BARELY ANY MERISTEM (BAM) receptor-like kinases. brx and ops mutants harbor a discontinuous protophloem that can be fully rescued by mutation in BAM3, but is only partially rescued when all three known phloem-specific CLE genes, CLE25/26/45 are simultaneously mutated. Here we identify a CLE gene closely related to CLE45, named CLE33. We show that double mutant cle33cle45 fully suppresses brx and ops protophloem phenotype. CLE33 orthologs are found in basal angiosperms, monocots, and eudicots, and the gene duplication which gave rise to CLE45 in Arabidopsis and other Brassicaceae appears to be a recent event. We thus discovered previously unidentified Arabidopsis CLE gene that is an essential player in protophloem formation.
Collapse
Affiliation(s)
- Samy Carbonnel
- Department of Biology, University of Fribourg, Chemin du Musee 10, 1700, Fribourg, Switzerland
| | - Salves Cornelis
- Department of Biology, University of Fribourg, Chemin du Musee 10, 1700, Fribourg, Switzerland
| | - Ora Hazak
- Department of Biology, University of Fribourg, Chemin du Musee 10, 1700, Fribourg, Switzerland.
| |
Collapse
|
2
|
Flavell RB. Perspective: 50 years of plant chromosome biology. PLANT PHYSIOLOGY 2021; 185:731-753. [PMID: 33604616 PMCID: PMC8133586 DOI: 10.1093/plphys/kiaa108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 12/04/2020] [Indexed: 06/12/2023]
Abstract
The past 50 years has been the greatest era of plant science discovery, and most of the discoveries have emerged from or been facilitated by our knowledge of plant chromosomes. At last we have descriptive and mechanistic outlines of the information in chromosomes that programs plant life. We had almost no such information 50 years ago when few had isolated DNA from any plant species. The important features of genes have been revealed through whole genome comparative genomics and testing of variants using transgenesis. Progress has been enabled by the development of technologies that had to be invented and then become widely available. Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) have played extraordinary roles as model species. Unexpected evolutionary dramas were uncovered when learning that chromosomes have to manage constantly the vast numbers of potentially mutagenic families of transposons and other repeated sequences. The chromatin-based transcriptional and epigenetic mechanisms that co-evolved to manage the evolutionary drama as well as gene expression and 3-D nuclear architecture have been elucidated these past 20 years. This perspective traces some of the major developments with which I have become particularly familiar while seeking ways to improve crop plants. I draw some conclusions from this look-back over 50 years during which the scientific community has (i) exposed how chromosomes guard, readout, control, recombine, and transmit information that programs plant species, large and small, weed and crop, and (ii) modified the information in chromosomes for the purposes of genetic, physiological, and developmental analyses and plant improvement.
Collapse
Affiliation(s)
- Richard B Flavell
- International Wheat Yield Partnership, 1500 Research Parkway, College Station, TX 77843, USA
| |
Collapse
|
3
|
Zebell SG. A broad view: Dick Flavell. PLANT PHYSIOLOGY 2021; 185:727-730. [PMID: 33822223 PMCID: PMC8133605 DOI: 10.1093/plphys/kiaa111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
|
4
|
In silico identification and structure function analysis of a putative coclaurine N-methyltransferase from Aristolochia fimbriata. Comput Biol Chem 2020; 85:107201. [DOI: 10.1016/j.compbiolchem.2020.107201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 12/31/2019] [Accepted: 01/08/2020] [Indexed: 11/22/2022]
|
5
|
Yi X, Yang Y, Wu P, Xu X, Li W. Alternative splicing events during adipogenesis from hMSCs. J Cell Physiol 2019; 235:304-316. [PMID: 31206189 DOI: 10.1002/jcp.28970] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 05/28/2019] [Accepted: 05/29/2019] [Indexed: 12/22/2022]
Abstract
Adipogenesis, the developmental process of progenitor-cell differentiating into adipocytes, leads to fat metabolic disorders. Alternative splicing (AS), a ubiquitous regulatory mechanism of gene expression, allows the generation of more than one unique messenger RNA (mRNA) species from a single gene. Till now, alternative splicing events during adipogenesis from human mesenchymal stem cells (hMSCs) are not yet fully elucidated. We performed RNA-Seq coupled with bioinformatics analysis to identify the differentially expressed AS genes and events during adipogenesis from hMSCs. A global survey separately identified 1262, 1181, 1167, and 1227 ASE involved in the most common types of AS including cassette exon, alt3, and alt5, especially with cassette exon the most prevalent, at 7, 14, 21, and 28 days during adipogenesis. Interestingly, 122 differentially expressed ASE referred to 118 genes, and the three genes including ACTN1 (alt3 and cassette), LRP1 (alt3 and alt5), and LTBP4 (cassette, cassette_multi, and unknown), appeared in multiple AS types of ASE during adipogenesis. Except for all the identified ASE of LRP1 occurred in the extracellular topological domain, alt3 (84) in transmembrane domain significantly differentially expressed was the potential key event during adipogenesis. Overall, we have, for the first time, conducted the global transcriptional profiling during adipogenesis of hMSCs to identify differentially expressed ASE and ASE-related genes. This finding would provide extensive ASE as the regulator of adipogenesis and the potential targets for future molecular research into adipogenesis-related metabolic disorders.
Collapse
Affiliation(s)
- Xia Yi
- Jiangxi Provincial Key Laboratory of Systems Biomedicine, Jiujiang University, Jiujiang, China
| | - Yunzhong Yang
- Beijing Yuanchuangzhilian Techonlogy Development Co., Ltd, Beijing, China
| | - Ping Wu
- Jiangxi Provincial Key Laboratory of Systems Biomedicine, Jiujiang University, Jiujiang, China
| | - Xiaoyuan Xu
- Jiangxi Provincial Key Laboratory of Systems Biomedicine, Jiujiang University, Jiujiang, China
| | - Weidong Li
- Jiangxi Provincial Key Laboratory of Systems Biomedicine, Jiujiang University, Jiujiang, China
| |
Collapse
|
6
|
Davies JP, Christensen CA. Developing Transgenic Agronomic Traits for Crops: Targets, Methods, and Challenges. Methods Mol Biol 2019; 1864:343-365. [PMID: 30415346 DOI: 10.1007/978-1-4939-8778-8_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The last two decades have witnessed a surge of investment by the agricultural biotechnology industry in the development of transgenic agronomic traits. These are traits that improve yield performance by modifying endogenous physiological processes such as energy capture, nutrient utilization, and stress tolerance. In this chapter we provide a foundation for understanding these fundamental processes and then outline approaches that have been taken to use this knowledge for yield improvement. We characterize the current status of product development pipelines in the industry and illustrate the trait discovery process with three important examples-bacterial cold-shock proteins, alanine aminotransferase, and auxin-regulated genes. The challenges with developing and commercializing an agronomic trait product are discussed.
Collapse
Affiliation(s)
- John P Davies
- Corteva Agriscience™, Agriculture Division of DowDuPont™, Indianapolis, IN, USA.
| | - Cory A Christensen
- Corteva Agriscience™, Agriculture Division of DowDuPont™, Indianapolis, IN, USA
| |
Collapse
|
7
|
Phylogenetic analyses and in-seedling expression of ammonium and nitrate transporters in wheat. Sci Rep 2018; 8:7082. [PMID: 29728590 PMCID: PMC5935732 DOI: 10.1038/s41598-018-25430-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 04/18/2018] [Indexed: 02/03/2023] Open
Abstract
Plants deploy several ammonium transporter (AMT) and nitrate transporter (NRT) genes to acquire NH4+ and NO3− from the soil into the roots and then transport them to other plant organs. Coding sequences of wheat genes obtained from ENSEMBL were aligned to known AMT and NRT sequences of Arabidopsis, barley, maize, rice, and wheat to retrieve homologous genes. Bayesian phylogenetic relationships among these genes showed distinct classification of sequences with significant homology to NRT1, NRT2, and NRT3 (NAR2). Inter-species gene duplication analysis showed that eight AMT and 77 NRT genes were orthologous to the AMT and NRT genes of aforementioned plant species. Expression patterns of these genes were studied via whole transcriptome sequencing of 21-day old seedlings of five spring wheat lines. Eight AMT and 52 NRT genes were differentially expressed between root and shoot; and 131 genes did not express neither in root nor in shoot of 21-day old seedlings. Homeologous genes in the A, B, and D genomes, characterized by high sequence homology, revealed that their counterparts exhibited different expression patterns. This complement and evolutionary relationship of wheat AMT and NRT genes is expected to help in development of wheat germplasm with increased efficiency in nitrogen uptake and usage.
Collapse
|
8
|
Rawal HC, Kumar S, Mithra S V A, Solanke AU, Nigam D, Saxena S, Tyagi A, V S, Yadav NR, Kalia P, Singh NP, Singh NK, Sharma TR, Gaikwad K. High Quality Unigenes and Microsatellite Markers from Tissue Specific Transcriptome and Development of a Database in Clusterbean (Cyamopsis tetragonoloba, L. Taub). Genes (Basel) 2017; 8:genes8110313. [PMID: 29120386 PMCID: PMC5704226 DOI: 10.3390/genes8110313] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 10/23/2017] [Accepted: 11/06/2017] [Indexed: 12/23/2022] Open
Abstract
Clusterbean (Cyamopsis tetragonoloba L. Taub), is an important industrial, vegetable and forage crop. This crop owes its commercial importance to the presence of guar gum (galactomannans) in its endosperm which is used as a lubricant in a range of industries. Despite its relevance to agriculture and industry, genomic resources available in this crop are limited. Therefore, the present study was undertaken to generate RNA-Seq based transcriptome from leaf, shoot, and flower tissues. A total of 145 million high quality Illumina reads were assembled using Trinity into 127,706 transcripts and 48,007 non-redundant high quality (HQ) unigenes. We annotated 79% unigenes against Plant Genes from the National Center for Biotechnology Information (NCBI), Swiss-Prot, Pfam, gene ontology (GO) and KEGG databases. Among the annotated unigenes, 30,020 were assigned with 116,964 GO terms, 9984 with EC and 6111 with 137 KEGG pathways. At different fragments per kilobase of transcript per millions fragments sequenced (FPKM) levels, genes were found expressed higher in flower tissue followed by shoot and leaf. Additionally, we identified 8687 potential simple sequence repeats (SSRs) with an average frequency of one SSR per 8.75 kb. A total of 28 amplified SSRs in 21 clusterbean genotypes resulted in polymorphism in 13 markers with average polymorphic information content (PIC) of 0.21. We also constructed a database named ‘ClustergeneDB’ for easy retrieval of unigenes and the microsatellite markers. The tissue specific genes identified and the molecular marker resources developed in this study is expected to aid in genetic improvement of clusterbean for its end use.
Collapse
Affiliation(s)
- Hukam C Rawal
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Shrawan Kumar
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Amitha Mithra S V
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Amolkumar U Solanke
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Deepti Nigam
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Swati Saxena
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Anshika Tyagi
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Sureshkumar V
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Neelam R Yadav
- Department of Biotechnology and Molecular Biology, CCS Haryana Agricultural University, Hisar 125004, India.
| | - Pritam Kalia
- ICAR-Indian Agricultural Research Institute, New Delhi 110012, India.
| | | | | | - Tilak Raj Sharma
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| | - Kishor Gaikwad
- ICAR-National Research Centre on Plant Biotechnology, New Delhi 110012, India.
| |
Collapse
|
9
|
Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017; 18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. RESULTS We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). CONCLUSIONS Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| | - Tatiana V. Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA USA
| | - Michael Hogan
- Orion Genomics, 4041 Forest Park Avenue, St. Louis, MO 63108 USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| |
Collapse
|
10
|
Genome-Wide Identification and Characterization of the LRR-RLK Gene Family in Two Vernicia Species. Int J Genomics 2015; 2015:823427. [PMID: 26783513 PMCID: PMC4691485 DOI: 10.1155/2015/823427] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 11/17/2015] [Indexed: 11/17/2022] Open
Abstract
Leucine-rich repeat receptor-like kinases (LRR-RLKs) make up the largest group of RLKs in plants and play important roles in many key biological processes such as pathogen response and signal transduction. To date, most studies on LRR-RLKs have been conducted on model plants. Here, we identified 236 and 230 LRR-RLKs in two industrial oil-producing trees: Vernicia fordii and Vernicia montana, respectively. Sequence alignment analyses showed that the homology of the RLK domain (23.81%) was greater than that of the LRR domain (9.51%) among the Vf/VmLRR-RLKs. The conserved motif of the LRR domain in Vf/VmLRR-RLKs matched well the known plant LRR consensus sequence but differed at the third last amino acid (W or L). Phylogenetic analysis revealed that Vf/VmLRR-RLKs were grouped into 16 subclades. We characterized the expression profiles of Vf/VmLRR-RLKs in various tissue types including root, leaf, petal, and kernel. Further investigation revealed that Vf/VmLRR-RLK orthologous genes mainly showed similar expression patterns in response to tree wilt disease, except 4 pairs of Vf/VmLRR-RLKs that showed opposite expression trends. These results represent an extensive evaluation of LRR-RLKs in two industrial oil trees and will be useful for further functional studies on these proteins.
Collapse
|
11
|
Zhang X, Feng H, Feng C, Xu H, Huang X, Wang Q, Duan X, Wang X, Wei G, Huang L, Kang Z. Isolation and characterisation of cDNA encoding a wheat heavy metal-associated isoprenylated protein involved in stress responses. PLANT BIOLOGY (STUTTGART, GERMANY) 2015; 17:1176-86. [PMID: 25951496 DOI: 10.1111/plb.12344] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/01/2015] [Indexed: 05/03/2023]
Abstract
In cells, metallochaperones are important proteins that safely transport metal ions. Heavy metal-associated isoprenylated plant proteins (HIPPs) are metallochaperones that contain a metal binding domain and a CaaX isoprenylation motif at the carboxy-terminal end. To investigate the roles of wheat heavy metal-associated isoprenylated plant protein (TaHIPP) genes in plant development and in stress responses, we isolated cDNA encoding the wheat TaHIPP1 gene, which contains a heavy metal-associated domain, nuclear localisation signals and an isoprenylation motif (CaaX motif). Quantitative real-time PCR analysis indicated that the TaHIPP1 gene was differentially expressed under biotic and abiotic stresses. Specifically, TaHIPP1 expression was up-regulated by ABA exposure or wounding. Additionally, TaHIPP1 over-expression in yeast (Schizosaccharomyces pombe) significantly increased the cell growth rate under Cu(2+) and high salinity stresses. The nuclear localisation of the protein was confirmed with confocal laser scanning microscopy of epidermal onion cells after particle bombardment with chimeric TaHIPP1-GFP constructs. In addition, TaHIPP1 was shown to enhance the susceptibility of wheat to Pst as determined by virus-induced gene silencing. These data indicate that TaHIPP1 is an important component in defence signalling pathways and may play a crucial role in the defence response of wheat to biotic and certain abiotic stresses.
Collapse
Affiliation(s)
- X Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Life Science, Northwest A&F University, Yangling, Shaanxi, China
| | - H Feng
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - C Feng
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Life Science, Northwest A&F University, Yangling, Shaanxi, China
| | - H Xu
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Life Science, Northwest A&F University, Yangling, Shaanxi, China
| | - X Huang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - Q Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Life Science, Northwest A&F University, Yangling, Shaanxi, China
| | - X Duan
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Life Science, Northwest A&F University, Yangling, Shaanxi, China
| | - X Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - G Wei
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - L Huang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - Z Kang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
12
|
Chauhan R, Jasrai Y, Pandya H. In Silico Analysis for Five Major Cereal Crops Phytocystatins. Interdiscip Sci 2015; 7:233-41. [PMID: 26267706 DOI: 10.1007/s12539-015-0264-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Revised: 01/15/2014] [Accepted: 02/07/2014] [Indexed: 11/28/2022]
Abstract
Five major cereal crops such as rice, wheat, maize, barley and sorghum are continuously threatened by a multitude of pathogens and other disorders. Cystatins offers a pivotal role in deciding the promising plant response. The use of bioinformatics tools for phylogenetic relationships of five major cereal crop (rice, wheat, maize, barley and sorghum) phytocystatins based on amino acid sequence information was elucidated, and their secondary and tertiary structures were investigated for structural comparisons. Twenty-eight distinct phytocystatins from 28 plant species were investigated. Phytocystatins could be divided into five distinct phylogenetic groups. Five major cereal crops their structural features were highly conserved, and their amino acid sequence similarities ranged from 48 to 86 %. A new highly conserved amino acid sequence motif, YEAKxWxKxF, in the C-terminal end being unique to phytocystatins was identified. The predicted 3D homology models showed a high conservation of the general central structure of the phytocystatins, i.e., the 4-5 anti-parallel [Formula: see text]-sheets, wrapping halfway round a single central [Formula: see text]-helix and particularly the three active site regions, the N-terminal, the first and second hairpin loops. Any structural differences seem to be mainly in the length of the N- and C-terminal, the length of the second hairpin loop and the fifth [Formula: see text]-sheet. Via docking experiments, small heterogeneities were observed in the vicinity of the OC-I active sites that seemed to be influential in the binding process and stability of the resultant inhibitor-protease complex.
Collapse
Affiliation(s)
- Rupal Chauhan
- Applied Botany Center, Department of Botany, University School of Sciences, Gujarat University, Ahmadabad, Gujarat, 380 009, India.
| | - Yogesh Jasrai
- Applied Botany Center, Department of Botany, University School of Sciences, Gujarat University, Ahmadabad, Gujarat, 380 009, India
| | - Himanshu Pandya
- Applied Botany Center, Department of Botany, University School of Sciences, Gujarat University, Ahmadabad, Gujarat, 380 009, India
| |
Collapse
|
13
|
Warren RL, Keeling CI, Yuen MMS, Raymond A, Taylor GA, Vandervalk BP, Mohamadi H, Paulino D, Chiu R, Jackman SD, Robertson G, Yang C, Boyle B, Hoffmann M, Weigel D, Nelson DR, Ritland C, Isabel N, Jaquish B, Yanchuk A, Bousquet J, Jones SJM, MacKay J, Birol I, Bohlmann J. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015; 83:189-212. [PMID: 26017574 DOI: 10.1111/tpj.12886] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 05/15/2015] [Indexed: 05/21/2023]
Abstract
White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.
Collapse
Affiliation(s)
- René L Warren
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Christopher I Keeling
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Macaire Man Saint Yuen
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Anthony Raymond
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Greg A Taylor
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Benjamin P Vandervalk
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Hamid Mohamadi
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Daniel Paulino
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Readman Chiu
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Shaun D Jackman
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Gordon Robertson
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Chen Yang
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Brian Boyle
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Margarete Hoffmann
- Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| | - Detlef Weigel
- Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| | - David R Nelson
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Carol Ritland
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Nathalie Isabel
- Natural Resources Canada, Laurentian Forestry Centre, Québec, QC, G1V 4C7, Canada
| | - Barry Jaquish
- British Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, BC, V8W 9C2, Canada
| | - Alvin Yanchuk
- British Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, BC, V8W 9C2, Canada
| | - Jean Bousquet
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - John MacKay
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Inanc Birol
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
14
|
Rodríguez-García MJ, Machado V, Galián J. Identification and characterisation of putative seminal fluid proteins from male reproductive tissue EST libraries in tiger beetles. BMC Genomics 2015; 16:391. [PMID: 25981911 PMCID: PMC4434525 DOI: 10.1186/s12864-015-1619-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2014] [Accepted: 05/05/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study of proteins transferred through semen can provide important information for biological questions such as adaptive evolution, the origin of new species and species richness. The objective of this study was to identify seminal fluid proteins (SFPs) that may contribute to the study of the reproductive system of tiger beetles (cicindelids), a group of more than 2,500 species distributed worldwide that occupy a great diversity of habitats. RESULTS Two cDNA libraries were constructed from the male gonads of Calomera littoralis and Cephalota litorea. Expressed sequence tags (ESTs) were analysed by bioinformatics approaches and 14 unigenes were selected as candidate SFPs, which were submitted to Reverse Transcription Polymerase Chain Reaction (RT-PCR) to identify patterns of tissue-specific expression. We have identified four novel putative SFPs of cicindelids, of which similarity searches did not show homologues with known function. However, two of the protein classes (immune response and hormone) predicted by Protfun are similar to SFPs reported in other insects. Searches for homology in other cicindelids showed one lineage specific SFPs (rapidly evolving proteins), only present in the closely related species C. littoralis and Lophyra flexuosa and two conserved SFP present in other tiger beetles species tested. CONCLUSIONS This work represents the first characterisation of putative SFPs in Adephagan species of the order Coleoptera. The results will serve as a foundation for further studies aimed to understand gene (and protein) functions and their evolutionary implications in this group of ecologically relevant beetles.
Collapse
Affiliation(s)
- María Juliana Rodríguez-García
- Department of Zoology and Physical Anthropology, Faculty of Veterinary, University of Murcia, Campus Mare Nostrum, E-30100, Murcia, Spain.
| | - Vilmar Machado
- Department of Zoology and Physical Anthropology, Faculty of Veterinary, University of Murcia, Campus Mare Nostrum, E-30100, Murcia, Spain.
| | - José Galián
- Department of Zoology and Physical Anthropology, Faculty of Veterinary, University of Murcia, Campus Mare Nostrum, E-30100, Murcia, Spain.
| |
Collapse
|
15
|
Yao QY, Xia EH, Liu FH, Gao LZ. Genome-wide identification and comparative expression analysis reveal a rapid expansion and functional divergence of duplicated genes in the WRKY gene family of cabbage, Brassica oleracea var. capitata. Gene 2014; 557:35-42. [PMID: 25481634 DOI: 10.1016/j.gene.2014.12.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/18/2022]
Abstract
WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes.
Collapse
Affiliation(s)
- Qiu-Yang Yao
- Laboratory of Plant Breeding and Utilization, Yunnan University, Kunming 650091, China; Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; University of the Chinese Academy of Sciences, Beijing 100039, China
| | - En-Hua Xia
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; University of the Chinese Academy of Sciences, Beijing 100039, China
| | - Fei-Hu Liu
- Laboratory of Plant Breeding and Utilization, Yunnan University, Kunming 650091, China.
| | - Li-Zhi Gao
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| |
Collapse
|
16
|
High-throughput sequencing and de novo assembly of Brassica oleracea var. Capitata L. for transcriptome analysis. PLoS One 2014; 9:e92087. [PMID: 24682075 PMCID: PMC3969326 DOI: 10.1371/journal.pone.0092087] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 02/18/2014] [Indexed: 12/28/2022] Open
Abstract
Background The cabbage, Brassica oleracea var. capitata L., has a distinguishable phenotype within the genus Brassica. Despite the economic and genetic importance of cabbage, there is little genomic data for cabbage, and most studies of Brassica are focused on other species or other B. oleracea subspecies. The lack of genomic data for cabbage, a non-model organism, hinders research on its molecular biology. Hence, the construction of reliable transcriptomic data based on high-throughput sequencing technologies is needed to enhance our understanding of cabbage and provide genomic information for future work. Methodology/Principal Findings We constructed cDNAs from total RNA isolated from the roots, leaves, flowers, seedlings, and calcium-limited seedling tissues of two cabbage genotypes: 102043 and 107140. We sequenced a total of six different samples using the Illumina HiSeq platform, producing 40.5 Gbp of sequence data comprising 401,454,986 short reads. We assembled 205,046 transcripts (≥ 200 bp) using the Velvet and Oases assembler and predicted 53,562 loci from the transcripts. We annotated 35,274 of the loci with 55,916 plant peptides in the Phytozome database. The average length of the annotated loci was 1,419 bp. We confirmed the reliability of the sequencing assembly using reverse-transcriptase PCR to identify tissue-specific gene candidates among the annotated loci. Conclusion Our study provides valuable transcriptome sequence data for B. oleracea var. capitata L., offering a new resource for studying B. oleracea and closely related species. Our transcriptomic sequences will enhance the quality of gene annotation and functional analysis of the cabbage genome and serve as a material basis for future genomic research on cabbage. The sequencing data from this study can be used to develop molecular markers and to identify the extreme differences among the phenotypes of different species in the genus Brassica.
Collapse
|
17
|
The function and properties of the transcriptional regulator COS1 in Magnaporthe oryzae. Fungal Biol 2013; 117:239-49. [DOI: 10.1016/j.funbio.2013.01.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Revised: 12/22/2012] [Accepted: 01/27/2013] [Indexed: 11/20/2022]
|
18
|
Gibson AK, Smith Z, Fuqua C, Clay K, Colbourne JK. Why so many unknown genes? Partitioning orphans from a representative transcriptome of the lone star tick Amblyomma americanum. BMC Genomics 2013; 14:135. [PMID: 23445305 PMCID: PMC3616916 DOI: 10.1186/1471-2164-14-135] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Accepted: 02/21/2013] [Indexed: 11/10/2022] Open
Abstract
Background Genomic resources within the phylum Arthropoda are largely limited to the true insects but are beginning to include unexplored subphyla, such as the Crustacea and Chelicerata. Investigations of these understudied taxa uncover high frequencies of orphan genes, which lack detectable sequence homology to genes in pre-existing databases. The ticks (Acari: Chelicerata) are one such understudied taxon for which genomic resources are urgently needed. Ticks are obligate blood-feeders that vector major diseases of humans, domesticated animals, and wildlife. In analyzing a transcriptome of the lone star tick Amblyomma americanum, one of the most abundant disease vectors in the United States, we find a high representation of unannotated sequences. We apply a general framework for quantifying the origin and true representation of unannotated sequences in a dataset and for evaluating the biological significance of orphan genes. Results Expressed sequence tags (ESTs) were derived from different life stages and populations of A. americanum and combined with ESTs available from GenBank to produce 14,310 ESTs, over twice the number previously available. The vast majority (71%) has no sequence homology to proteins archived in UniProtKB. We show that poor sequence or assembly quality is not a major contributor to this high representation by orphan genes. Moreover, most unannotated sequences are functional: a microarray experiment demonstrates that 59% of functional ESTs are unannotated. Lastly, we attempt to further annotate our EST dataset using genomic datasets from other members of the Acari, including Ixodes scapularis, four other tick species and the mite Tetranychus urticae. We find low homology with these species, consistent with significant divergence within this subclass. Conclusions We conclude that the abundance of orphan genes in A. americanum likely results from 1) taxonomic isolation stemming from divergence within the tick lineage and limited genomic resources for ticks and 2) lineage-specific genes needing functional genomic studies to evaluate their association with the unique biology of ticks. The EST sequences described here will contribute substantially to the development of tick genomics. Moreover, the framework provided for the evaluation of orphan genes can guide analyses of future transcriptome sequencing projects.
Collapse
Affiliation(s)
- Amanda K Gibson
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.
| | | | | | | | | |
Collapse
|
19
|
Ahmed NU, Park JI, Jung HJ, Seo MS, Kumar TS, Lee IH, Nou IS. Identification and characterization of stress resistance related genes of Brassica rapa. Biotechnol Lett 2012; 34:979-87. [PMID: 22286206 DOI: 10.1007/s10529-012-0860-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/03/2012] [Indexed: 11/29/2022]
Abstract
Two biotic stress resistance related genes from the full-length cDNA library of Brassica rapa cv. Osome were identified from EST analysis and determined to be pathogenesis-related (PR) 12 Brassica defensin-like family protein (BrDLFP) and PR-10 Brassica Betv1 allergen family protein (BrBetv1AFP) after sequence analysis and homology study with other stress resistance related same family genes. In the expression analysis, both genes expressed in different organs and during all developmental growth stages in healthy plants. Expression of BrDLFP significantly increased and BrBetv1AFP gradually decreased after infection with Pectobacterium carotovorum subsp. carotovorum in Chinese cabbage. Expression of these two genes significantly changed after cold, salt, drought and ABA stress treatments. These two PR genes may therefore be involved in the plant resistance against biotic and abiotic stresses.
Collapse
Affiliation(s)
- Nasar Uddin Ahmed
- Department of Horticulture, Sunchon National University, 413 Jungangno, Suncheon, Jeonnam 540-742, Republic of Korea
| | | | | | | | | | | | | |
Collapse
|
20
|
Han B, Xu S, Xie YJ, Huang JJ, Wang LJ, Yang Z, Zhang CH, Sun Y, Shen WB, Xie GS. ZmHO-1, a maize haem oxygenase-1 gene, plays a role in determining lateral root development. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2012; 184:63-74. [PMID: 22284711 DOI: 10.1016/j.plantsci.2011.12.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Revised: 12/08/2011] [Accepted: 12/15/2011] [Indexed: 05/04/2023]
Abstract
Previous results revealed that haem oxygenase-1 (HO-1)/carbon monoxide (CO) system is involved in auxin-induced adventitious root formation. In this report, a cDNA for the gene ZmHO-1, encoding an HO-1 protein, was cloned from Zea mays seedlings. ZmHO-1 has a conserved HO signature sequence and shares highest homology with rice SE5 (OsHO-1) protein. We further discovered that N-1-naphthylacetic acid (NAA), haemin, and CO aqueous solution, led to the induction of ZmHO-1 expression as well as the thereafter promotion of lateral root development. These effects were specific for ZmHO-1 since the potent HO-1 inhibitor zinc protoporphyrin IX (ZnPPIX) differentially blocked the above actions. The addition of haemin and CO were able to reverse the auxin depletion-triggered inhibition of lateral root formation as well as the decreased ZmHO-1 transcripts. Molecular evidence showed that the haemin- or CO-mediated the modulation of target genes responsible for lateral root formation, including ZmCDK and ZmCKI2, could be blocked by ZnPPIX. Overexpression of ZmHO-1 in transgenic Arabidopsis plants resulted in promotion of lateral root development as well as the modulation of cell cycle regulatory gene expressions. Overall, our results suggested that a maize HO-1 gene is required for the lateral root formation.
Collapse
Affiliation(s)
- Bin Han
- Rubber Research Institute, Chinese Academy of Tropical Agricultural Sciences, Hainan 571737, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Li Z, Zhang Z, Yan P, Huang S, Fei Z, Lin K. RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genomics 2011; 12:540. [PMID: 22047402 PMCID: PMC3219749 DOI: 10.1186/1471-2164-12-540] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 11/02/2011] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set. RESULTS The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/. CONCLUSIONS We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.
Collapse
Affiliation(s)
- Zhen Li
- College of Life Sciences, Beijing Normal University, 19 Xinjiekouwai Street, Beijing, 100875, China
| | | | | | | | | | | |
Collapse
|
22
|
Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR. Approaches to Fungal Genome Annotation. Mycology 2011; 2:118-141. [PMID: 22059117 PMCID: PMC3207268 DOI: 10.1080/21501203.2011.606851] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.
Collapse
Affiliation(s)
- Brian J Haas
- Genome Sequencing and Analysis Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, U.S.A
| | | | | | | | | |
Collapse
|
23
|
Xia Z, Xu H, Zhai J, Li D, Luo H, He C, Huang X. RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis. PLANT MOLECULAR BIOLOGY 2011; 77:299-308. [PMID: 21811850 DOI: 10.1007/s11103-011-9811-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2010] [Accepted: 07/14/2011] [Indexed: 05/05/2023]
Abstract
Hevea brasiliensis, being the only source of commercial natural rubber, is an extremely economically important crop. In an effort to facilitate biological, biochemical and molecular research in rubber biosynthesis, here we report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly. Out of 13,807 H. brasiliensis cDNA sequences deposited in Genbank of the National Center for Biotechnology Information (NCBI) (as of Feb 2011), 11,746 sequences (84.5%) could be matched with the assembled unigenes through nucleotide BLAST. The assembled sequences were annotated with gene descriptions, Gene Ontology (GO) and Clusters of Orthologous Group (COG) terms. In all, 37,432 unigenes were successfully annotated, of which 24,545 (65.5%) aligned to Ricinus communis proteins. Furthermore, the annotated uingenes were functionally classified according to the GO, COG and Kyoto Encyclopedia of Genes and Genomes databases. Our data provides the most comprehensive sequence resource available for the study of rubber trees as well as demonstrates effective use of Illumina sequencing and de novo transcriptome assembly in a species lacking genomic information.
Collapse
Affiliation(s)
- Zhihui Xia
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources/Institute of BioScience and Technology, College of Agriculture, Hainan University, Haikou, 570228, People's Republic of China
| | | | | | | | | | | | | |
Collapse
|
24
|
Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ. A white spruce gene catalog for conifer genome analyses. PLANT PHYSIOLOGY 2011; 157:14-28. [PMID: 21730200 PMCID: PMC3165865 DOI: 10.1104/pp.111.179663] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Accepted: 06/24/2011] [Indexed: 05/18/2023]
Abstract
Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.
Collapse
|
25
|
Schoof H. Towards Interoperability in Genome Databases: The MAtDB (MIPS Arabidopsis Thaliana Database) Experience. Comp Funct Genomics 2011; 4:255-8. [PMID: 18629123 PMCID: PMC2447410 DOI: 10.1002/cfg.278] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2003] [Revised: 02/05/2003] [Accepted: 02/06/2003] [Indexed: 11/09/2022] Open
Abstract
Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species.
Collapse
Affiliation(s)
- Heiko Schoof
- Technische Universität München Lehrstuhl genomorientierte Bioinformatik Wissenschaftszentrum Weihenstephan Freising 85350 Germany
| |
Collapse
|
26
|
Doyle CE, Donaldson ME, Morrison EN, Saville BJ. Ustilago maydis transcript features identified through full-length cDNA analysis. Mol Genet Genomics 2011; 286:143-59. [PMID: 21750919 DOI: 10.1007/s00438-011-0634-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2011] [Accepted: 06/28/2011] [Indexed: 12/13/2022]
Abstract
Ustilago maydis is the model for investigating basidiomycete biotrophic plant pathogens. To further the annotation of its genome, 12,943 full-length cDNA sequences were used to construct databases for the promoter and untranslated regions of U. maydis genes. A subset of clones was sequenced to determine full cDNA sequences. These and the original ESTs were assembled into contigs representing 3,058, or 45%, of the predicted U. maydis genes. The new sequencing allowed the confirmation of 2,842 gene models, 690 of which contain an intron. The use of full-length cDNA clone sequences ensured that untranslated regions were physically linked to the open reading frames (ORFs), not merely aligned upstream of the start of transcription. Identified sequence features include: (1) over 500 potential short upstream ORFs, (2) 95 gene models that require further annotation, (3) one new potential ORF, (4) varying GC content in different gene regions, (5) a WebLogo motif for the start of translation, (6) the correlation of UTR length with transcript representation in cDNA libraries and with gene function categories, (7) a relationship between natural antisense transcripts and UTR length that differs from that of Saccharomyces cerevisiae, (8) a potential relationship between DNA replication and the control of transcription, and (9) new insights regarding mechanisms for the control of transcription and mRNA maturation in U. maydis.
Collapse
Affiliation(s)
- Colleen E Doyle
- Environmental and Life Sciences Graduate Program, Trent University, Peterborough, ON K9J 7B8, Canada
| | | | | | | |
Collapse
|
27
|
Buell CR, Last RL. Twenty-first century plant biology: impacts of the Arabidopsis genome on plant biology and agriculture. PLANT PHYSIOLOGY 2010; 154:497-500. [PMID: 20921172 PMCID: PMC2948998 DOI: 10.1104/pp.110.159541] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 06/15/2010] [Indexed: 05/28/2023]
Affiliation(s)
| | - Robert L. Last
- Department of Plant Biology (C.R.B., R.L.L.) and Department of Biochemistry and Molecular Biology (R.L.L.), Michigan State University, East Lansing, Michigan 48824–1319
| |
Collapse
|
28
|
|
29
|
|
30
|
The transcriptome of the early life history stages of the California Sea Hare Aplysia californica. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2010; 5:165-70. [PMID: 20434970 DOI: 10.1016/j.cbd.2010.03.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 03/25/2010] [Accepted: 03/27/2010] [Indexed: 11/24/2022]
Abstract
Aplysia californica is a marine opisthobranch mollusc used as a model organism in neurobiology for cellular analyses of learning and behavior because it possesses a comparatively small number of neurons of large size. The mollusca comprise the second largest animal phylum, yet detailed genetic and genomic information is only recently beginning to accrue. Thus developmental and comparative evolutionary biology as well as biomedical research would benefit from additional information on DNA sequences of Aplysia. Therefore, we have constructed a series of unidirectional cDNA libraries from different life stages of Aplysia. These include whole organisms from the egg, veliger, metamorphic, and juvenile stages as well as adult neural tissue for reference. Individual clones were randomly picked, and high-throughput, single pass sequence analysis was performed to generate 7971 sequences. Of these, there were 5507 quality-filtered ESTs that clustered into 1988 unigenes, which are annotated and deposited into GenBank. A significant number (497) of ESTs did not match existing Aplysia ESTs and are thus potentially novel sequences for Aplysia. GO and KEGG analyses of these novel sequences indicated that a large number were involved in protein binding and translation, consistent with the predominant biosynthetic role in development and the presence of stage-specific protein isoforms.
Collapse
|
31
|
Kim S, Park J, Park SY, Mitchell TK, Lee YH. Identification and analysis of in planta expressed genes of Magnaporthe oryzae. BMC Genomics 2010; 11:104. [PMID: 20146797 PMCID: PMC2832786 DOI: 10.1186/1471-2164-11-104] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 02/10/2010] [Indexed: 11/14/2022] Open
Abstract
Background Infection of plants by pathogens and the subsequent disease development involves substantial changes in the biochemistry and physiology of both partners. Analysis of genes that are expressed during these interactions represents a powerful strategy to obtain insights into the molecular events underlying these changes. We have employed expressed sequence tag (EST) analysis to identify rice genes involved in defense responses against infection by the blast fungus Magnaporthe oryzae and fungal genes involved in infectious growth within the host during a compatible interaction. Results A cDNA library was constructed with RNA from rice leaves (Oryza sativa cv. Hwacheong) infected with M. oryzae strain KJ201. To enrich for fungal genes, subtraction library using PCR-based suppression subtractive hybridization was constructed with RNA from infected rice leaves as a tester and that from uninfected rice leaves as the driver. A total of 4,148 clones from two libraries were sequenced to generate 2,302 non-redundant ESTs. Of these, 712 and 1,562 ESTs could be identified to encode fungal and rice genes, respectively. To predict gene function, Gene Ontology (GO) analysis was applied, with 31% and 32% of rice and fungal ESTs being assigned to GO terms, respectively. One hundred uniESTs were found to be specific to fungal infection EST. More than 80 full-length fungal cDNA sequences were used to validate ab initio annotated gene model of M. oryzae genome sequence. Conclusion This study shows the power of ESTs to refine genome annotation and functional characterization. Results of this work have advanced our understanding of the molecular mechanisms underpinning fungal-plant interactions and formed the basis for new hypothesis.
Collapse
Affiliation(s)
- Soonok Kim
- Department of Agricultural Biotechnology, Center for Fungal Pathogenesis, Center for Agricultural Biomaterials and Center for Fungal Genetic Resources, Seoul National University, Seoul 151-921, Korea
| | | | | | | | | |
Collapse
|
32
|
Gou X, He K, Yang H, Yuan T, Lin H, Clouse SD, Li J. Genome-wide cloning and sequence analysis of leucine-rich repeat receptor-like protein kinase genes in Arabidopsis thaliana. BMC Genomics 2010. [PMID: 20064227 DOI: 10.1186/1471‐2164‐11‐19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transmembrane receptor kinases play critical roles in both animal and plant signaling pathways regulating growth, development, differentiation, cell death, and pathogenic defense responses. In Arabidopsis thaliana, there are at least 223 Leucine-rich repeat receptor-like kinases (LRR-RLKs), representing one of the largest protein families. Although functional roles for a handful of LRR-RLKs have been revealed, the functions of the majority of members in this protein family have not been elucidated. RESULTS As a resource for the in-depth analysis of this important protein family, the complementary DNA sequences (cDNAs) of 194 LRR-RLKs were cloned into the Gateway donor vector pDONR/Zeo and analyzed by DNA sequencing. Among them, 157 clones showed sequences identical to the predictions in the Arabidopsis sequence resource, TAIR8. The other 37 cDNAs showed gene structures distinct from the predictions of TAIR8, which was mainly caused by alternative splicing of pre-mRNA. Most of the genes have been further cloned into Gateway destination vectors with GFP or FLAG epitope tags and have been transformed into Arabidopsis for in planta functional analysis. All clones from this study have been submitted to the Arabidopsis Biological Resource Center (ABRC) at Ohio State University for full accessibility by the Arabidopsis research community. CONCLUSIONS Most of the Arabidopsis LRR-RLK genes have been isolated and the sequence analysis showed a number of alternatively spliced variants. The generated resources, including cDNA entry clones, expression constructs and transgenic plants, will facilitate further functional analysis of the members of this important gene family.
Collapse
Affiliation(s)
- Xiaoping Gou
- Department of Botany and Microbiology, University of Oklahoma, Norman, OK 73019, USA
| | | | | | | | | | | | | |
Collapse
|
33
|
Gou X, He K, Yang H, Yuan T, Lin H, Clouse SD, Li J. Genome-wide cloning and sequence analysis of leucine-rich repeat receptor-like protein kinase genes in Arabidopsis thaliana. BMC Genomics 2010; 11:19. [PMID: 20064227 PMCID: PMC2817689 DOI: 10.1186/1471-2164-11-19] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 01/11/2010] [Indexed: 11/19/2022] Open
Abstract
Background Transmembrane receptor kinases play critical roles in both animal and plant signaling pathways regulating growth, development, differentiation, cell death, and pathogenic defense responses. In Arabidopsis thaliana, there are at least 223 Leucine-rich repeat receptor-like kinases (LRR-RLKs), representing one of the largest protein families. Although functional roles for a handful of LRR-RLKs have been revealed, the functions of the majority of members in this protein family have not been elucidated. Results As a resource for the in-depth analysis of this important protein family, the complementary DNA sequences (cDNAs) of 194 LRR-RLKs were cloned into the GatewayR donor vector pDONR/ZeoR and analyzed by DNA sequencing. Among them, 157 clones showed sequences identical to the predictions in the Arabidopsis sequence resource, TAIR8. The other 37 cDNAs showed gene structures distinct from the predictions of TAIR8, which was mainly caused by alternative splicing of pre-mRNA. Most of the genes have been further cloned into GatewayR destination vectors with GFP or FLAG epitope tags and have been transformed into Arabidopsis for in planta functional analysis. All clones from this study have been submitted to the Arabidopsis Biological Resource Center (ABRC) at Ohio State University for full accessibility by the Arabidopsis research community. Conclusions Most of the Arabidopsis LRR-RLK genes have been isolated and the sequence analysis showed a number of alternatively spliced variants. The generated resources, including cDNA entry clones, expression constructs and transgenic plants, will facilitate further functional analysis of the members of this important gene family.
Collapse
Affiliation(s)
- Xiaoping Gou
- Department of Botany and Microbiology, University of Oklahoma, Norman, OK 73019, USA
| | | | | | | | | | | | | |
Collapse
|
34
|
Marques MC, Alonso-Cantabrana H, Forment J, Arribas R, Alamar S, Conejero V, Perez-Amador MA. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus. BMC Genomics 2009; 10:428. [PMID: 19747386 PMCID: PMC2754500 DOI: 10.1186/1471-2164-10-428] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 09/11/2009] [Indexed: 01/02/2023] Open
Abstract
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species.
Collapse
Affiliation(s)
- M Carmen Marques
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia and Consejo Superior de Investigaciones Científicas, Avenida de los Naranjos s/n, Valencia 46022, Spain.
| | | | | | | | | | | | | |
Collapse
|
35
|
Upadhyay SK, Shankar J, Singh Y, Basir SF, Madan T, Sarma PU. Expressed sequence tags of Aspergillus fumigatus: Extension of catalogue and their evaluation as putative drug targets and/or diagnostic markers. Indian J Clin Biochem 2009; 24:131-6. [PMID: 23105821 DOI: 10.1007/s12291-009-0024-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Aspergillus fumigatus a fungal pathogen is implicated in a spectrum of allergic and invasive disorders in humans. Validation of transcriptome of pathogen is essential for understanding its virulence mechanism and to identify new therapeutic targets/diagnostic markers. In order to rapidly identify genes of Aspergillus fumigatus we adopted sequencing of cDNA clones. Our earlier effort has lead to identification of 68 expressed sequence tags of Aspergillus fumigatus. Present study describes 52 more expressed sequence tags generated by sequencing 200 phage clones of a non-normalized cDNA library. One of the cDNA clones comprised of the complete coding region for tetratricopeptide repeat domain protein gene. Various homology search algorithms were employed to assign functions to expressed sequence tags coding for hypothetical proteins, and relevance of these expressed sequence tags or their protein products as drug targets/diagnostic markers was examined by searching for homologues in fungi and human.
Collapse
Affiliation(s)
- Santosh Kumar Upadhyay
- Institute of Genomics and Integrative Biology, Mall road, Delhi, 110007 India ; Department of Biosciences, Jamia Millia Islamia, New Delhi, 110025 India
| | | | | | | | | | | |
Collapse
|
36
|
Seki M, Shinozaki K. Functional genomics using RIKEN Arabidopsis thaliana full-length cDNAs. JOURNAL OF PLANT RESEARCH 2009; 122:355-66. [PMID: 19412652 DOI: 10.1007/s10265-009-0239-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 04/08/2009] [Indexed: 05/24/2023]
Abstract
Full-length cDNAs are essential for the correct annotation of genomic sequences as well as for the functional analysis of genes and their products. We have isolated about 240,000 RIKEN Arabidopsis full-length (RAFL) cDNA clones. These clones were clustered into about 17,000 non-redundant cDNA groups, i.e., about 60% of all Arabidopsis predicted genes. The sequence information of the RAFL cDNAs is useful for promoter analysis, and for the correct annotation of predicted transcriptional units and gene products. We prepared cDNA microarrays containing independent full-length cDNA groups and studied the expression profiles of genes under various stress- and hormone-treatment conditions, and in various mutants and transgenic plants. These expression profiling studies have shown the expression levels of many genes as a detailed snapshot describing the state of a biological system in planta under various conditions. We have applied RAFL cDNAs to the functional analysis of proteins using the full-length cDNA over-expressing (FOX) gene hunting system and the wheat germ cell-free protein synthesis system. The RAFL cDNA collection was also used for determination of the domain structure of proteins by NMR. In this review, we summarize the present state and perspectives of functional genomics using RAFL cDNAs.
Collapse
Affiliation(s)
- Motoaki Seki
- Plant Genomic Network Research Team, Plant Functional Genomics Research Group, RIKEN Plant Science Center, RIKEN Yokohama Institute, Yokohama 230-0045, Japan.
| | | |
Collapse
|
37
|
Park K, Dirisala VR, Oh Y, Choi H, Lee KT, Kim JH, Lee HT, Seo KH, Park C. Reporting 678 putative cSNPs from full-length enriched cDNA sequences of the Korean native pig. J Anim Breed Genet 2009; 126:127-33. [PMID: 19320769 DOI: 10.1111/j.1439-0388.2008.00765.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Sequences from the clones of full-length enriched cDNA libraries serve as valuable resources for functional genomic studies. We have analysed 1970 high-quality chromatograms (Phred value >or= 30) that were obtained from sequencing the 5' ends of brainstem, liver, neocortex and spleen clones derived from full-length enriched cDNA libraries from Korean native pigs. In addition, 50,000 pig expressed sequence tag (EST) sequence trace files were obtained from Genbank and combined with our sequencing information to facilitate SNP identification in silico. The process generated 8118 contigs, of which 239 included minimum one sequence from Korean native pig and contained 678 putative coding single nucleotide polymorphisms (cSNPs). Of these, 33 putative cSNPs were randomly selected for confirmatory analysis and validated using 20 pigs from four different breeds (Duroc, Landrace, Yorkshire, Korean native pig). Of the 33 putative cSNPs, 20 were confirmed (61%), which was similar to the frequency reported in other studies. We also identified 15 new cSNPs from the validation process, which were not detected by our in silico analysis. Our study shows that analysing genetically diverse pig breeds including the Korean native pig could serve as a useful strategy for generating a large number of cSNPs.
Collapse
Affiliation(s)
- K Park
- Department of Animal Biotechnology, Konkuk University, Seoul, Republic of Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Gu L, Guo R. Genome-wide detection and analysis of alternative splicing for nucleotide binding site-leucine-rich repeats sequences in rice. J Genet Genomics 2009; 34:247-57. [PMID: 17498622 DOI: 10.1016/s1673-8527(07)60026-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2006] [Accepted: 08/03/2006] [Indexed: 11/20/2022]
Abstract
Alternative splicing is a major contributor to genomic complexity and proteome diversity, yet the analysis of alternative splicing for the sequence containing nucleotide binding site and leucine-rich repeats (NBS-LRR) domain has not been explored in rice (Oryza sativa L.). Hidden Markov model (HMM) searches were performed for NBS-LRR domain. 875 NBS-LRR-encoding sequences were obtained from the Institute for Genomic Research (TIGR). All of them were used to blast Knowledge-based Oryza Molecular Biological Encyclopaedia (KOME), TIGR rice gene index (TGI), and Universal Protein Resource (UniProt) to obtain homologous full-length cDNAs (FL-cDNAs), tentative consensus sequences, and protein sequences. Alternative splicing events were detected from genomic alignment of FL-cDNAs, tentative consensus sequences, and protein sequences, which provide valuable information on splice variants of genes. These sequences were aligned to the corresponding BAC sequences using the Spidey and Sim4 programs and each of the proteins was aligned by tBLASTn. Of the 875 NBS-LRR sequences, 119 (13.6%) sequences had alternative splicing where multiple FL-cDNAs, TGI sequences and proteins corresponded to the same gene. 71 intron retention events, 20 exon skipping events, 16 alternative termination events, 25 alternative initiation events, 12 alternative 5' splicing events, and 16 alternative 3' splicing events were identified. Most of these alternative splices were supported by two or more transcripts. The data sets are available at http://www.bioinfor.org Furthermore, the bioinformatics analysis of splice boundaries showed that exon skipping and intron retention did not exhibit strong consensus. This implies a different regulation mechanism that guides the expression of splice isoforms. This article also presents the analysis of the effects of intron retention on proteins. The C-terminal regions of alternative proteins turned out to be more variable than the N-terminal regions. Finally, tissue distribution and protein localization of alternative splicing were explored. The largest categories of tissue distributions for alternative splicing were shoot and callus. More than one-thirds of protein localization for splice forms was plasma membrane and cytoplasm. All the NBS-LRR proteins for splice forms may have important function in disease resistance and activate downstream signaling pathways.
Collapse
Affiliation(s)
- Lianfeng Gu
- College of Agriculture, Guangdong Ocean University, Zhanjiang 524088, China
| | | |
Collapse
|
39
|
Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob-ur-Rahman, Ware D, Westhoff P, Mayer KFX, Messing J, Rokhsar DS. The Sorghum bicolor genome and the diversification of grasses. Nature 2009; 457:551-6. [PMID: 19189423 DOI: 10.1038/nature07723] [Citation(s) in RCA: 1638] [Impact Index Per Article: 109.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Collapse
Affiliation(s)
- Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Zeba N, Isbat M, Kwon NJ, Lee MO, Kim SR, Hong CB. Heat-inducible C3HC4 type RING zinc finger protein gene from Capsicum annuum enhances growth of transgenic tobacco. PLANTA 2009; 229:861-71. [PMID: 19125289 DOI: 10.1007/s00425-008-0884-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2008] [Accepted: 12/16/2008] [Indexed: 05/27/2023]
Abstract
Capsicum annuum RING Zinc Finger Protein 1 (CaRZFP1) gene is a novel C3HC4-type RING zinc finger protein gene which was previously isolated from a cDNA library for hot pepper plants treated of heat-shock. The CaRZFP1 was inducible to diverse environmental stresses in hot pepper plants. We introduced the CaRZFP1 into the Wisconsin 38 cultivar of tobacco (Nicotiana tabacum) by Agrobacterium mediated transformation under the control of the CaMV 35S promoter. Expression of the transgene in the transformed tobacco plants was demonstrated by RNA blot analyses. There appeared no adverse effect of over-expression of the transgene on overall growth and development of transformants. The genetic analysis of tested T(1) lines showed that the transgene segregated in a Mendelian fashion. Transgenic tobacco lines that expressed the CaRZFP1 gene were compared with several different empty vector lines and they exhibited enhanced growth; they have larger primary root, more lateral root, larger hypocotyls and bigger leaf size, resulting in heavier fresh weight. Enhanced growth of transgenic lines accompanied with longer vegetative growth that resulted in bigger plants with higher number of leaves. Microarray analysis revealed the up-regulation of some growth related genes in the transgenic plants which were verified by specific oligomer RNA blot analyses. These results indicate that CaRZFP1 activates and up-regulates some growth related proteins and thereby effectively promoting plant growth.
Collapse
Affiliation(s)
- Naheed Zeba
- School of Biological Sciences and Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 151-742, South Korea
| | | | | | | | | | | |
Collapse
|
41
|
Grigsby IF, Rutledge EM, Morton CA, Finger FP. Functional redundancy of two C. elegans homologs of the histone chaperone Asf1 in germline DNA replication. Dev Biol 2009; 329:64-79. [PMID: 19233156 DOI: 10.1016/j.ydbio.2009.02.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Revised: 01/30/2009] [Accepted: 02/11/2009] [Indexed: 11/20/2022]
Abstract
Eukaryotic genomes contain either one or two genes encoding homologs of the highly conserved histone chaperone Asf1, however, little is known of their in vivo roles in animal development. UNC-85 is one of the two Caenorhabditis elegans Asf1 homologs and functions in post-embryonic replication in neuroblasts. Although UNC-85 is broadly expressed in replicating cells, the specificity of the mutant phenotype suggested possible redundancy with the second C. elegans Asf1 homolog, ASFL-1. The asfl-1 mRNA is expressed in the meiotic region of the germline, and mutants in either Asf1 genes have reduced brood sizes and low penetrance defects in gametogenesis. The asfl-1, unc-85 double mutants are sterile, displaying defects in oogenesis and spermatogenesis, and analysis of DNA synthesis revealed that DNA replication in the germline is blocked. Analysis of somatic phenotypes previously observed in unc-85 mutants revealed that they are neither observed in asfl-1 mutants, nor enhanced in the double mutants, with the exception of enhanced male tail abnormalities in the double mutants. These results suggest that the two Asf1 homologs have partially overlapping functions in the germline, while UNC-85 is primarily responsible for several Asf1 functions in somatic cells, and is more generally involved in replication throughout development.
Collapse
Affiliation(s)
- Iwen F Grigsby
- Department of Biology and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Biotech-BCHM-2, Troy, NY 12180, USA
| | | | | | | |
Collapse
|
42
|
Umezawa T, Sakurai T, Totoki Y, Toyoda A, Seki M, Ishiwata A, Akiyama K, Kurotani A, Yoshida T, Mochida K, Kasuga M, Todaka D, Maruyama K, Nakashima K, Enju A, Mizukado S, Ahmed S, Yoshiwara K, Harada K, Tsubokura Y, Hayashi M, Sato S, Anai T, Ishimoto M, Funatsuki H, Teraishi M, Osaki M, Shinano T, Akashi R, Sakaki Y, Yamaguchi-Shinozaki K, Shinozaki K. Sequencing and analysis of approximately 40,000 soybean cDNA clones from a full-length-enriched cDNA library. DNA Res 2008; 15:333-46. [PMID: 18927222 PMCID: PMC2608845 DOI: 10.1093/dnares/dsn024] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2008] [Accepted: 09/10/2008] [Indexed: 11/14/2022] Open
Abstract
A large collection of full-length cDNAs is essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We obtained a total of 39,936 soybean cDNA clones (GMFL01 and GMFL02 clone sets) in a full-length-enriched cDNA library which was constructed from soybean plants that were grown under various developmental and environmental conditions. Sequencing from 5' and 3' ends of the clones generated 68 661 expressed sequence tags (ESTs). The EST sequences were clustered into 22,674 scaffolds involving 2580 full-length sequences. In addition, we sequenced 4712 full-length cDNAs. After removing overlaps, we obtained 6570 new full-length sequences of soybean cDNAs so far. Our data indicated that 87.7% of the soybean cDNA clones contain complete coding sequences in addition to 5'- and 3'-untranslated regions. All of the obtained data confirmed that our collection of soybean full-length cDNAs covers a wide variety of genes. Comparative analysis between the derived sequences from soybean and Arabidopsis, rice or other legumes data revealed that some specific genes were involved in our collection and a large part of them could be annotated to unknown functions. A large set of soybean full-length cDNA clones reported in this study will serve as a useful resource for gene discovery from soybean and will also aid a precise annotation of the soybean genome.
Collapse
Affiliation(s)
- Taishi Umezawa
- Gene Discovery Research Team, RIKEN Plant Science Center, Koyadai 3-1-1, Tsukuba, Ibaraki 305-0074, Japan
| | - Tetsuya Sakurai
- Integrated Genome Informatics Research Unit, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Yasushi Totoki
- Genome Annotation and Comparative Analysis Team, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Atsushi Toyoda
- Sequence Technology Team, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Atsushi Ishiwata
- Integrated Genome Informatics Research Unit, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Kenji Akiyama
- Integrated Genome Informatics Research Unit, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Atsushi Kurotani
- Integrated Genome Informatics Research Unit, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Takuhiro Yoshida
- Integrated Genome Informatics Research Unit, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Keiichi Mochida
- Gene Discovery Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Mie Kasuga
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
| | - Daisuke Todaka
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
- Laboratory of Plant Molecular Physiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kyonoshin Maruyama
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
| | - Kazuo Nakashima
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
| | - Akiko Enju
- Plant Genomic Network Research Team, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Saho Mizukado
- Gene Discovery Research Team, RIKEN Plant Science Center, Koyadai 3-1-1, Tsukuba, Ibaraki 305-0074, Japan
| | - Selina Ahmed
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
| | - Kyoko Yoshiwara
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
| | - Kyuya Harada
- National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan
| | - Yasutaka Tsubokura
- National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan
| | - Masaki Hayashi
- National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan
| | - Shusei Sato
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Toyoaki Anai
- Department of Applied Biological Sciences, Faculty of Agriculture, Saga University, Honjo 840-8502, Saga, Japan
| | - Masao Ishimoto
- National Agricultural Research Center for Hokkaido Region, 1 Hitsujigaoka, Sapporo, Hokkaido 062-8555, Japan
| | - Hideyuki Funatsuki
- National Agricultural Research Center for Hokkaido Region, 1 Hitsujigaoka, Sapporo, Hokkaido 062-8555, Japan
| | | | - Mitsuru Osaki
- Graduate School of Agriculture, Hokkaido University, Sapporo, Hokkaido 060-8589, Japan
| | - Takuro Shinano
- National Agricultural Research Center for Hokkaido Region, 1 Hitsujigaoka, Sapporo, Hokkaido 062-8555, Japan
| | - Ryo Akashi
- Division of BioResource, Frontier Science Research Center, University of Miyazaki, Miyazaki 889-2192, Japan
| | - Yoshiyuki Sakaki
- Genome Annotation and Comparative Analysis Team, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- Sequence Technology Team, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Kazuko Yamaguchi-Shinozaki
- Biological Resources Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki 305-8686, Japan
- Laboratory of Plant Molecular Physiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kazuo Shinozaki
- Gene Discovery Research Team, RIKEN Plant Science Center, Koyadai 3-1-1, Tsukuba, Ibaraki 305-0074, Japan
| |
Collapse
|
43
|
Kapustin Y, Souvorov A, Tatusova T, Lipman D. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct 2008; 3:20. [PMID: 18495041 PMCID: PMC2440734 DOI: 10.1186/1745-6150-3-20] [Citation(s) in RCA: 244] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Accepted: 05/21/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. RESULTS We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. CONCLUSION Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes.
Collapse
Affiliation(s)
- Yuri Kapustin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, USA.
| | | | | | | |
Collapse
|
44
|
Grigsby IF, Finger FP. UNC-85, a C. elegans homolog of the histone chaperone Asf1, functions in post-embryonic neuroblast replication. Dev Biol 2008; 319:100-9. [PMID: 18490010 DOI: 10.1016/j.ydbio.2008.04.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Revised: 04/08/2008] [Accepted: 04/08/2008] [Indexed: 11/28/2022]
Abstract
Normal animal development requires accurate cell divisions, not only in the early stages of rapid embryonic cleavages, but also in later developmental stages. The Caenorhabditis elegans unc-85 gene is implicated only in cell divisions that occur post-embryonically, primarily in terminal neuronal lineages. Variable post-embryonic cell division failures in ventral cord motoneuron precursors result in uncoordinated locomotion of unc-85 mutant larvae by the second larval stage. These neuroblast cell division failures often result in unequally sized daughter nuclei, and sometimes in nuclear fusions. Using a combination of conventional mapping techniques and microarray analysis, we cloned the unc-85 gene, and find that it encodes one of two C. elegans homologs of the yeast Anti-silencing function 1 (Asf1) histone chaperone. The unc-85 gene is expressed in replicating cells throughout development, and the protein is localized in nuclei. Examination of null mutants confirms that embryonic neuroblast cell divisions occur normally, but post-embryonic neuroblast cell divisions fail. Analysis of the DNA content of the mutant neurons indicates that defective replication in post-embryonic neuroblasts gives rise to ventral cord neurons with an average DNA content of approximately 2.5 n. We conclude that UNC-85 functions in post-embryonic DNA replication in ventral cord motor neuron precursors.
Collapse
Affiliation(s)
- Iwen F Grigsby
- Biology Department and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, 110 8th Street, Biotech-BCHM-2, Troy, NY 12180, USA
| | | |
Collapse
|
45
|
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 2008; 9:R7. [PMID: 18190707 PMCID: PMC2395244 DOI: 10.1186/gb-2008-9-1-r7] [Citation(s) in RCA: 1884] [Impact Index Per Article: 117.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2007] [Revised: 12/17/2007] [Accepted: 01/11/2008] [Indexed: 01/16/2023] Open
Abstract
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.
Collapse
Affiliation(s)
- Brian J Haas
- J Craig Venter Institute, The Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Liu Q, Mackey AJ, Roos DS, Pereira FCN. Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. ACTA ACUST UNITED AC 2008; 24:597-605. [PMID: 18187439 DOI: 10.1093/bioinformatics/btn004] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The increasing diversity and variable quality of evidence relevant to gene annotation argues for a probabilistic framework that automatically integrates such evidence to yield candidate gene models. RESULTS Evigan is an automated gene annotation program for eukaryotic genomes, employing probabilistic inference to integrate multiple sources of gene evidence. The probabilistic model is a dynamic Bayes network whose parameters are adjusted to maximize the probability of observed evidence. Consensus gene predictions are then derived by maximum likelihood decoding, yielding n-best models (with probabilities for each). Evigan is capable of accommodating a variety of evidence types, including (but not limited to) gene models computed by diverse gene finders, BLAST hits, EST matches, and splice site predictions; learned parameters encode the relative quality of evidence sources. Since separate training data are not required (apart from the training sets used by individual gene finders), Evigan is particularly attractive for newly sequenced genomes where little or no reliable manually curated annotation is available. The ability to produce a ranked list of alternative gene models may facilitate identification of alternatively spliced transcripts. Experimental application to ENCODE regions of the human genome, and the genomes of Plasmodium vivax and Arabidopsis thaliana show that Evigan achieves better performance than any of the individual data sources used as evidence. AVAILABILITY The source code is available at http://www.seas.upenn.edu/~strctlrn/evigan/evigan.html.
Collapse
Affiliation(s)
- Qian Liu
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia PA 19104, USA.
| | | | | | | |
Collapse
|
47
|
Thibaud-Nissen F, Campbell M, Hamilton JP, Zhu W, Buell CR. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome. BMC Genomics 2007; 8:388. [PMID: 17961238 PMCID: PMC2151081 DOI: 10.1186/1471-2164-8-388] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 10/25/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. RESULTS We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. CONCLUSION We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/.
Collapse
|
48
|
Pertea M, Mount SM, Salzberg SL. A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics 2007; 8:159. [PMID: 17517127 PMCID: PMC1892810 DOI: 10.1186/1471-2105-8-159] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2006] [Accepted: 05/21/2007] [Indexed: 02/05/2023] Open
Abstract
Background Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites. Results We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software. Conclusion Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
49
|
D'Agostino N, Traini A, Frusciante L, Chiusano ML. Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome. BMC Bioinformatics 2007; 8 Suppl 1:S9. [PMID: 17430576 PMCID: PMC1885861 DOI: 10.1186/1471-2105-8-s1-s9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. RESULTS GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences.GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. CONCLUSION The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation.
Collapse
Affiliation(s)
- Nunzio D'Agostino
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| | - Alessandra Traini
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| | - Luigi Frusciante
- Department of Soil, Plant, and Environmental Sciences, University 'Federico II', 80055 Portici, Naples, Italy
| | - Maria Luisa Chiusano
- Department of Structural and Functional Biology, University 'Federico II', 80126 Naples, Italy
| |
Collapse
|
50
|
Kumar S, Dutta A, Sinha AK, Sen J. Cloning, characterization and localization of a novel basic peroxidase gene from Catharanthus roseus. FEBS J 2007; 274:1290-303. [PMID: 17298442 DOI: 10.1111/j.1742-4658.2007.05677.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Catharanthus roseus (L.) G. Don produces a number of biologically active terpenoid indole alkaloids via a complex terpenoid indole alkaloid biosynthetic pathway. The final dimerization step of this pathway, leading to the synthesis of a dimeric alkaloid, vinblastine, was demonstrated to be catalyzed by a basic peroxidase. However, reports of the gene encoding this enzyme are scarce for C. roseus. We report here for the first time the cloning, characterization and localization of a novel basic peroxidase, CrPrx, from C. roseus. A 394 bp partial peroxidase cDNA (CrInt1) was initially amplified from the internodal stem tissue, using degenerate oligonucleotide primers, and cloned. The full-length coding region of CrPrx cDNA was isolated by screening a leaf-specific cDNA library with CrInt1 as probe. The CrPrx nucleotide sequence encodes a deduced translation product of 330 amino acids with a 21 amino acid signal peptide, suggesting that CrPrx is secretory in nature. The molecular mass of this unprocessed and unmodified deduced protein is estimated to be 37.43 kDa, and the pI value is 8.68. CrPrx was found to belong to a 'three intron' category of gene that encodes a class III basic secretory peroxidase. CrPrx protein and mRNA were found to be present in specific organs and were regulated by different stress treatments. Using a beta-glucuronidase-green fluorescent protein fusion of CrPrx protein, we demonstrated that the fused protein is localized in leaf epidermal and guard cell walls of transiently transformed tobacco. We propose that CrPrx is involved in cell wall synthesis, and also that the gene is induced under methyl jasmonate treatment. Its potential involvement in the terpenoid indole alkaloid biosynthetic pathway is discussed.
Collapse
Affiliation(s)
- Santosh Kumar
- National Centre for Plant Genome Research, JNU Campus, Aruna Asaf Ali Marg, New Delhi 110-067, India
| | | | | | | |
Collapse
|