1
|
Pelosi B. Developing a bioinformatics pipeline for comparative protein classification analysis. BMC Genom Data 2022; 23:43. [PMID: 35668373 PMCID: PMC9172112 DOI: 10.1186/s12863-022-01045-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/11/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Protein classification is a task of paramount importance in various fields of biology. Despite the great momentum of modern implementation of protein classification, machine learning techniques such as Random Forest and Neural Network could not always be used for several reasons: data collection, unbalanced classification or labelling of the data.As an alternative, I propose the use of a bioinformatics pipeline to search for and classify information from protein databases. Hence, to evaluate the efficiency and accuracy of the pipeline, I focused on the carotenoid biosynthetic genes and developed a filtering approach to retrieve orthologs clusters in two well-studied plants that belong to the Brassicaceae family: Arabidopsis thaliana and Brassica rapa Pekinensis group. The result obtained has been compared with previous studies on carotenoid biosynthetic genes in B. rapa where phylogenetic analysis was conducted. RESULTS The developed bioinformatics pipeline relies on commercial software and multiple databeses including the use of phylogeny, Gene Ontology terms (GOs) and Protein Families (Pfams) at a protein level. Furthermore, the phylogeny is coupled with "population analysis" to evaluate the potential orthologs. All the steps taken together give a final table of potential orthologs. The phylogenetic tree gives a result of 43 putative orthologs conserved in B. rapa Pekinensis group. Different A. thaliana proteins have more than one syntenic ortholog as also shown in a previous finding (Li et al., BMC Genomics 16(1):1-11, 2015). CONCLUSIONS This study demonstrates that, when the biological features of proteins of interest are not specific, I can rely on a computational approach in filtering steps for classification purposes. The comparison of the results obtained here for the carotenoid biosynthetic genes with previous research confirmed the accuracy of the developed pipeline which can therefore be applied for filtering different types of datasets.
Collapse
Affiliation(s)
- Benedetta Pelosi
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
2
|
Zhang W, Corwin JA, Copeland DH, Feusier J, Eshbaugh R, Cook DE, Atwell S, Kliebenstein DJ. Plant-necrotroph co-transcriptome networks illuminate a metabolic battlefield. eLife 2019; 8:e44279. [PMID: 31081752 PMCID: PMC6557632 DOI: 10.7554/elife.44279] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/08/2019] [Indexed: 12/27/2022] Open
Abstract
A central goal of studying host-pathogen interaction is to understand how host and pathogen manipulate each other to promote their own fitness in a pathosystem. Co-transcriptomic approaches can simultaneously analyze dual transcriptomes during infection and provide a systematic map of the cross-kingdom communication between two species. Here we used the Arabidopsis-B. cinerea pathosystem to test how plant host and fungal pathogen interact at the transcriptomic level. We assessed the impact of genetic diversity in pathogen and host by utilization of a collection of 96 isolates infection on Arabidopsis wild-type and two mutants with jasmonate or salicylic acid compromised immunities. We identified ten B. cinereagene co-expression networks (GCNs) that encode known or novel virulence mechanisms. Construction of a dual interaction network by combining four host- and ten pathogen-GCNs revealed potential connections between the fungal and plant GCNs. These co-transcriptome data shed lights on the potential mechanisms underlying host-pathogen interaction.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Plant PathologyKansas State UniversityManhattanUnited States
- Department of Plant SciencesUniversity of California, DavisDavisUnited States
| | - Jason A Corwin
- Department of Ecology and Evolution BiologyUniversity of ColoradoBoulderUnited States
| | | | - Julie Feusier
- Department of Plant SciencesUniversity of California, DavisDavisUnited States
| | - Robert Eshbaugh
- Department of Plant SciencesUniversity of California, DavisDavisUnited States
| | - David E Cook
- Department of Plant PathologyKansas State UniversityManhattanUnited States
| | - Suzi Atwell
- Department of Plant SciencesUniversity of California, DavisDavisUnited States
| | - Daniel J Kliebenstein
- Department of Plant SciencesUniversity of California, DavisDavisUnited States
- DynaMo Center of ExcellenceUniversity of CopenhagenFrederiksbergDenmark
| |
Collapse
|
3
|
Castro PH, Santos MÂ, Freitas S, Cana-Quijada P, Lourenço T, Rodrigues MAA, Fonseca F, Ruiz-Albert J, Azevedo JE, Tavares RM, Castillo AG, Bejarano ER, Azevedo H. Arabidopsis thaliana SPF1 and SPF2 are nuclear-located ULP2-like SUMO proteases that act downstream of SIZ1 in plant development. JOURNAL OF EXPERIMENTAL BOTANY 2018; 69:4633-4649. [PMID: 30053161 PMCID: PMC6117582 DOI: 10.1093/jxb/ery265] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Post-translational modifiers such as the small ubiquitin-like modifier (SUMO) peptide act as fast and reversible protein regulators. Functional characterization of the sumoylation machinery has determined the key regulatory role that SUMO plays in plant development. Unlike components of the SUMO conjugation pathway, SUMO proteases (ULPs) are encoded by a relatively large gene family and are potential sources of specificity within the pathway. This study reports a thorough comparative genomics and phylogenetic characterization of plant ULPs, revealing the presence of one ULP1-like and three ULP2-like SUMO protease subgroups within plant genomes. As representatives of an under-studied subgroup, Arabidopsis SPF1 and SPF2 were subjected to functional characterization. Loss-of-function mutants implicated both proteins with vegetative growth, flowering time, and seed size and yield. Mutants constitutively accumulated SUMO conjugates, and yeast complementation assays associated these proteins with the function of ScUlp2 but not ScUlp1. Fluorescence imaging placed both proteins in the plant cell nucleoplasm. Transcriptomics analysis indicated strong regulatory involvement in secondary metabolism, cell wall remodelling, and nitrate assimilation. Furthermore, developmental defects of the spf1-1 spf2-2 (spf1/2) double-mutant opposed those of the major E3 ligase siz1 mutant and, most significantly, developmental and transcriptomic characterization of the siz1 spf1/2 triple-mutant placed SIZ1 as epistatic to SPF1 and SPF2.
Collapse
Affiliation(s)
- Pedro Humberto Castro
- Biosystems & Integrative Sciences Institute (BioISI), Plant Functional Biology Center (CBFP), University of Minho, Campus de Gualtar, Braga, Portugal
- Area de Genética, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus Teatinos, Málaga, Spain
- CIBIO, InBIO—Research Network in Biodiversity and Evolutionary Biology, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Miguel Ângelo Santos
- Biosystems & Integrative Sciences Institute (BioISI), Plant Functional Biology Center (CBFP), University of Minho, Campus de Gualtar, Braga, Portugal
| | - Sara Freitas
- Biosystems & Integrative Sciences Institute (BioISI), Plant Functional Biology Center (CBFP), University of Minho, Campus de Gualtar, Braga, Portugal
- CIBIO, InBIO—Research Network in Biodiversity and Evolutionary Biology, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Pepe Cana-Quijada
- Area de Genética, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus Teatinos, Málaga, Spain
| | - Tiago Lourenço
- Biosystems & Integrative Sciences Institute (BioISI), Plant Functional Biology Center (CBFP), University of Minho, Campus de Gualtar, Braga, Portugal
| | - Mafalda A A Rodrigues
- PRPlants Lab, GPlantS Unit, Instituto de Tecnologia Química e Biológica—Universidade Nova de Lisboa, Estação Agronómica Nacional, Oeiras, Portugal
| | - Fátima Fonseca
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Porto, Portugal
| | - Javier Ruiz-Albert
- Area de Genética, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus Teatinos, Málaga, Spain
| | - Jorge E Azevedo
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Porto, Portugal
- Instituto de Ciências Biomédicas de Abel Salazar (ICBAS), Universidade do Porto, Porto, Portugal
| | - Rui Manuel Tavares
- Biosystems & Integrative Sciences Institute (BioISI), Plant Functional Biology Center (CBFP), University of Minho, Campus de Gualtar, Braga, Portugal
| | - Araceli G Castillo
- Area de Genética, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus Teatinos, Málaga, Spain
| | - Eduardo R Bejarano
- Area de Genética, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus Teatinos, Málaga, Spain
| | - Herlander Azevedo
- CIBIO, InBIO—Research Network in Biodiversity and Evolutionary Biology, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| |
Collapse
|
4
|
Reiser L, Berardini TZ, Li D, Muller R, Strait EM, Li Q, Mezheritsky Y, Vetushko A, Huala E. Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw018. [PMID: 26989150 PMCID: PMC4795935 DOI: 10.1093/database/baw018] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 02/03/2016] [Indexed: 11/13/2022]
Abstract
Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org.
Collapse
Affiliation(s)
- Leonore Reiser
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Tanya Z Berardini
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Donghui Li
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Robert Muller
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Emily M Strait
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Qian Li
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Yarik Mezheritsky
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Andrey Vetushko
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| | - Eva Huala
- Phoenix Bioinformatics, The Arabidopsis Information Resource, 643 Bair Island Rd. Suite 403, Redwood City, CA 94063, USA
| |
Collapse
|
5
|
Campbell MS, Yandell M. An Introduction to Genome Annotation. CURRENT PROTOCOLS IN BIOINFORMATICS 2015; 52:4.1.1-4.1.17. [PMID: 26678385 DOI: 10.1002/0471250953.bi0401s52] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.
Collapse
Affiliation(s)
- Michael S Campbell
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah
| | - Mark Yandell
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah
| |
Collapse
|
6
|
Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, Huala E. The Arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome. Genesis 2015; 53:474-85. [PMID: 26201819 PMCID: PMC4545719 DOI: 10.1002/dvg.22877] [Citation(s) in RCA: 608] [Impact Index Per Article: 67.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 07/15/2015] [Accepted: 07/15/2015] [Indexed: 11/09/2022]
Abstract
The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR's biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR's focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible.
Collapse
Affiliation(s)
| | | | - Donghui Li
- Phoenix Bioinformatics, Redwood City, California
| | | | | | - Emily Strait
- Phoenix Bioinformatics, Redwood City, California
| | - Eva Huala
- Phoenix Bioinformatics, Redwood City, California
| |
Collapse
|
7
|
Baxevanis AD, Bateman A. The Importance of Biological Databases in Biological Discovery. ACTA ACUST UNITED AC 2015; 50:1.1.1-1.1.8. [PMID: 26094768 DOI: 10.1002/0471250953.bi0101s50] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed.
Collapse
Affiliation(s)
| | - Alex Bateman
- European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| |
Collapse
|
8
|
Li D, Dreher K, Knee E, Brkljacic J, Grotewold E, Berardini TZ, Lamesch P, Garcia-Hernandez M, Reiser L, Huala E. Arabidopsis database and stock resources. Methods Mol Biol 2014; 1062:65-96. [PMID: 24057361 DOI: 10.1007/978-1-62703-580-4_4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
The volume of Arabidopsis information has increased enormously in recent years as a result of the sequencing of the reference genome and other large-scale functional genomics projects. Much of the data is stored in public databases, where data are organized, analyzed, and made freely accessible to the research community. These databases are resources that researchers can utilize for making predictions and developing testable hypotheses. The methods in this chapter describe ways to access and utilize Arabidopsis data and genomic resources found in databases and stock centers.
Collapse
Affiliation(s)
- Donghui Li
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Integration of latex protein sequence data provides comprehensive functional overview of latex proteins. Mol Biol Rep 2014; 41:1469-81. [PMID: 24395295 DOI: 10.1007/s11033-013-2992-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Accepted: 12/24/2013] [Indexed: 01/03/2023]
Abstract
The laticiferous system is one of the most important conduit systems in higher plants, which produces a milky-like sap known as latex. Latex contains diverse secondary metabolites with various ecological functions. To obtain a comprehensive overview of the latex proteome, we integrated available latex proteins sequences and constructed a comprehensive dataset composed of 1,208 non-redundant latex proteins from 20 various latex-bearing plants. The results of functional analyses revealed that latex proteins are involved in various biological processes, including transcription, translation, protein degradation and the plant response to environmental stimuli. The results of the comparative analysis showed that the functions of the latex proteins are similar to those of phloem, suggesting the functional conservation of plant vascular proteins. The presence of latex proteins in mitochondria and plastids suggests the production of diverse secondary metabolites. Furthermore, using a BLAST search, we identified 854 homologous latex proteins in eight plant species, including three latex-bearing plants, such as papaya, caster bean and cassava, suggesting that latex proteins were newly evolved in vascular plants. Taken together, this study is the largest and most comprehensive in silico analysis of the latex proteome. The results obtained here provide useful resources and information for characterizing the evolution of the latex proteome.
Collapse
|
10
|
Nowacka M, Strozycki PM, Jackowiak P, Hojka-Osinska A, Szymanski M, Figlerowicz M. Identification of stable, high copy number, medium-sized RNA degradation intermediates that accumulate in plants under non-stress conditions. PLANT MOLECULAR BIOLOGY 2013; 83:191-204. [PMID: 23708952 PMCID: PMC3777163 DOI: 10.1007/s11103-013-0079-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 05/15/2013] [Indexed: 05/22/2023]
Abstract
It is becoming increasingly evident that the RNA degradome is a crucial component of the total cellular RNA pool. Here, we present an analysis of the medium-sized RNAs (midi RNAs) that form in Arabidopsis thaliana. Our analyses revealed that the midi RNA fraction contained mostly 20-70-nt-long fragments derived from various RNA species, including tRNA, rRNA, mRNA and snRNA. The majority of these fragments could be classified as stable RNA degradation intermediates (RNA degradants). Using two dimensional polyacrylamide gel electrophoresis, we demonstrated that high copy number RNA (hcn RNA) degradants appear in plant cells not only during stress, as it was earlier suggested. They are continuously produced also under physiological conditions. The data collected indicated that the accumulation pattern of the hcn RNA degradants is organ-specific and can be affected by various endogenous and exogenous factors. In addition, we demonstrated that selected degradants efficiently inhibit translation in vitro. Thus, the results of our studies suggest that hcn RNA degradants are likely to be involved in the regulation of gene expression in plants.
Collapse
Affiliation(s)
- Martyna Nowacka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
- Present Address: Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Trojdena 4, 02-109 Warsaw, Poland
| | - Pawel M. Strozycki
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Paulina Jackowiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Anna Hojka-Osinska
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Maciej Szymanski
- Computational Genomics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
- Institute of Computing Science, Poznan University of Technology, Piotrowo 3A, 60-965 Poznan, Poland
| |
Collapse
|
11
|
Kim J, Park JH, Lim CJ, Lim JY, Ryu JY, Lee BW, Choi JP, Kim WB, Lee HY, Choi Y, Kim D, Hur CG, Kim S, Noh YS, Shin C, Kwon SY. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars. BMC Genomics 2012; 13:657. [PMID: 23171001 PMCID: PMC3527192 DOI: 10.1186/1471-2164-13-657] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2012] [Accepted: 10/22/2012] [Indexed: 12/21/2022] Open
Abstract
Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand rose flower development and to identify candidate genes for important phenotypes.
Collapse
Affiliation(s)
- Jungeun Kim
- Green Bio Research Center, 125 Gwahak-ro, Yuseong-gu, Daejeon 305-806, Republic of Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Platzer A, Nizhynska V, Long Q. TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data. BIOLOGY 2012; 1:395-410. [PMID: 24832231 PMCID: PMC4009769 DOI: 10.3390/biology1020395] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Revised: 08/22/2012] [Accepted: 08/31/2012] [Indexed: 01/26/2023]
Abstract
Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool that uses paired-end reads to identify the novel locations of known TEs. TE-Locate can utilize either a database of TE sequences, or annotated TEs within the reference sequence of interest. This makes TE-Locate useful in the search for any mobile sequence, including retrotransposed gene copies. One major concern is to act on the correct hierarchy level, thereby avoiding an incorrect calling of a single insertion as multiple events of TEs with high sequence similarity. We used the (super)family level, but TE-Locate can also use any other level, right down to the individual transposable element. As an example of analysis with TE-Locate, we used the Swedish population in the 1,001 Arabidopsis genomes project, and presented the biological insights gained from the novel TEs, inducing the association between different TE superfamilies. The program is freely available, and the URL is provided in the end of the paper.
Collapse
Affiliation(s)
- Alexander Platzer
- Gregor Mendel Institute (GMI), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.
| | | | - Quan Long
- Gregor Mendel Institute (GMI), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.
| |
Collapse
|
13
|
Schönberger J, Hammes UZ, Dresselhaus T. In vivo visualization of RNA in plants cells using the λN₂₂ system and a GATEWAY-compatible vector series for candidate RNAs. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2012; 71:173-81. [PMID: 22268772 DOI: 10.1111/j.1365-313x.2012.04923.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The past decade has seen a tremendous increase in RNA research, which has demonstrated that RNAs are involved in many more processes than were previously thought. The dynamics of RNA synthesis towards their regulated activity requires the interplay of RNAs with numerous RNA binding proteins (RBPs). The localization of RNA, a mechanism for controlling translation in a spatial and temporal fashion, requires processing and assembly of RNA into transport granules in the nucleus, transport towards cytoplasmic destinations and regulation of its activity. Compared with animal model systems little is known about RNA dynamics and motility in plants. Commonly used methods to study RNA transport and localization are time-consuming, and require expensive equipment and a high level of experimental skill. Here, we introduce the λN₂₂ RNA stem-loop binding system for the in vivo visualization of RNA in plant cells. The λN₂₂ system consists of two components: the λN₂₂ RNA binding peptide and the corresponding box-B stem loops. We generated fusions of λN₂₂ to different fluorophores and a GATEWAY vector series for the simple fusion of any target RNA 5' or 3' to box-B stem loops. We show that the λN₂₂ system can be used to detect RNAs in transient expression assays, and that it offers advantages compared with the previously described MS2 system. Furthermore, the λN₂₂ system can be used in combination with the MS2 system to visualize different RNAs simultaneously in the same cell. The toolbox of vectors generated for both systems is easy to use and promises significant progress in our understanding of RNA transport and localization in plant cells.
Collapse
Affiliation(s)
- Johannes Schönberger
- Cell Biology and Plant Biochemistry, University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany
| | | | | |
Collapse
|
14
|
Gulledge AA, Roberts AD, Vora H, Patel K, Loraine AE. Mining Arabidopsis thaliana RNA-seq data with Integrated Genome Browser reveals stress-induced alternative splicing of the putative splicing regulator SR45a. AMERICAN JOURNAL OF BOTANY 2012; 99:219-31. [PMID: 22291167 DOI: 10.3732/ajb.1100355] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
PREMISE OF THE STUDY High-throughput sequencing of cDNA libraries prepared from diverse samples (RNA-seq) can reveal genome-wide changes in alternative splicing. Using RNA-seq data to assess splicing at the level of individual genes requires the ability to visualize read alignments alongside genomic annotations. To meet this need, we added RNA-seq visualization capability to Integrated Genome Browser (IGB), a free desktop genome visualization tool. To illustrate this capability, we present an in-depth analysis of abiotic stresses and their effects on alternative splicing of SR45a (AT1G07350), a putative splicing regulator from Arabidopsis thaliana. METHODS cDNA libraries prepared from Arabidopsis plants that were subjected to heat and dehydration stresses were sequenced on an Illumina GAIIx sequencer, yielding more than 511 million high-quality 75-base, single-end sequence reads. Reads were aligned onto the reference genome and visualized in IGB. KEY RESULTS Using IGB, we confirmed exon-skipping alternative splicing in SR45a. Exon-skipped variant AT1G07350.1 encodes full-length SR45a protein with intact RS and RNA recognition motifs, while nonskipped variant AT1G07350.2 lacks the C-terminal RS region due to a frameshift in the alternative exon. Heat and drought stresses increased both transcript abundance and the proportion of exon-skipped transcripts encoding the full-length protein. We identified new splice sites and observed frequent intron retention flanking the alternative exon. CONCLUSIONS This study underlines the importance of visual inspection of RNA-seq alignments when investigating alternatively spliced genes. We showed that heat and dehydration stresses increase overall abundance of SR45a mRNA while also increasing production of transcripts encoding the full-length SR45a protein relative to other splice variants.
Collapse
Affiliation(s)
- Alyssa A Gulledge
- Department of Bioinformatics and Genomics, North Carolina Research Campus, University of North Carolina at Charlotte, 600 Laureate Way, Kannapolis, North Carolina 28081, USA
| | | | | | | | | |
Collapse
|
15
|
Bielewicz D, Dolata J, Zielezinski A, Alaba S, Szarzynska B, Szczesniak MW, Jarmolowski A, Szweykowska-Kulinska Z, Karlowski WM. mirEX: a platform for comparative exploration of plant pri-miRNA expression data. Nucleic Acids Res 2011; 40:D191-7. [PMID: 22013167 PMCID: PMC3245179 DOI: 10.1093/nar/gkr878] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
mirEX is a comprehensive platform for comparative analysis of primary microRNA expression data. RT–qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface. A new way of accessing gene expression data in mirEX includes a simple mouse operated querying system and dynamic graphs for data mining analyses. In contrast to other publicly available databases, the mirEX interface allows a simultaneous comparison of expression levels between various microRNA genes in diverse organs and developmental stages. Currently, mirEX integrates information about the expression profile of 190 Arabidopsis thaliana pri-miRNAs in seven different developmental stages: seeds, seedlings and various organs of mature plants. Additionally, by providing RNA structural models, publicly available deep sequencing results, experimental procedure details and careful selection of auxiliary data in the form of web links, mirEX can function as a one-stop solution for Arabidopsis microRNA information. A web-based mirEX interface can be accessed at http://bioinfo.amu.edu.pl/mirex.
Collapse
Affiliation(s)
- Dawid Bielewicz
- Department of Gene Expression, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Caldana C, Degenkolbe T, Cuadros-Inostroza A, Klie S, Sulpice R, Leisse A, Steinhauser D, Fernie AR, Willmitzer L, Hannah MA. High-density kinetic analysis of the metabolomic and transcriptomic response of Arabidopsis to eight environmental conditions. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2011; 67:869-84. [PMID: 21575090 DOI: 10.1111/j.1365-313x.2011.04640.x] [Citation(s) in RCA: 166] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The time-resolved response of Arabidopsis thaliana towards changing light and/or temperature at the transcriptome and metabolome level is presented. Plants grown at 21°C with a light intensity of 150 μE m⁻² sec⁻¹ were either kept at this condition or transferred into seven different environments (4°C, darkness; 21°C, darkness; 32°C, darkness; 4°C, 85 μE m⁻² sec⁻¹; 21 °C, 75 μE m⁻² sec⁻¹; 21°C, 300 μE m⁻² sec⁻¹ ; 32°C, 150 μE m⁻² sec⁻¹). Samples were taken before (0 min) and at 22 time points after transfer resulting in (8×) 22 time points covering both a linear and a logarithmic time series totaling 177 states. Hierarchical cluster analysis shows that individual conditions (defined by temperature and light) diverge into distinct trajectories at condition-dependent times and that the metabolome follows different kinetics from the transcriptome. The metabolic responses are initially relatively faster when compared with the transcriptional responses. Gene Ontology over-representation analysis identifies a common response for all changed conditions at the transcriptome level during the early response phase (5-60 min). Metabolic networks reconstructed via metabolite-metabolite correlations reveal extensive environment-specific rewiring. Detailed analysis identifies conditional connections between amino acids and intermediates of the tricarboxylic acid cycle. Parallel analysis of transcriptional changes strongly support a model where in the absence of photosynthesis at normal/high temperatures protein degradation occurs rapidly and subsequent amino acid catabolism serves as the main cellular energy supply. These results thus demonstrate the engagement of the electron transfer flavoprotein system under short-term environmental perturbations.
Collapse
Affiliation(s)
- Camila Caldana
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Remmerie N, De Vijlder T, Laukens K, Dang TH, Lemière F, Mertens I, Valkenborg D, Blust R, Witters E. Next generation functional proteomics in non-model plants: A survey on techniques and applications for the analysis of protein complexes and post-translational modifications. PHYTOCHEMISTRY 2011; 72:1192-218. [PMID: 21345472 DOI: 10.1016/j.phytochem.2011.01.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Revised: 11/21/2010] [Accepted: 01/03/2011] [Indexed: 05/11/2023]
Abstract
The congruent development of computational technology, bioinformatics and analytical instrumentation makes proteomics ready for the next leap. Present-day state of the art proteomics grew from a descriptive method towards a full stake holder in systems biology. High throughput and genome wide studies are now made at the functional level. These include quantitative aspects, functional aspects with respect to protein interactions as well as post translational modifications and advanced computational methods that aid in predicting protein function and mapping these functionalities across the species border. In this review an overview is given of the current status of these aspects in plant studies with special attention to non-genomic model plants.
Collapse
Affiliation(s)
- Noor Remmerie
- Center for Proteomics, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Fernandez-Calvino L, Faulkner C, Walshaw J, Saalbach G, Bayer E, Benitez-Alfonso Y, Maule A. Arabidopsis plasmodesmal proteome. PLoS One 2011; 6:e18880. [PMID: 21533090 PMCID: PMC3080382 DOI: 10.1371/journal.pone.0018880] [Citation(s) in RCA: 191] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Accepted: 03/11/2011] [Indexed: 11/26/2022] Open
Abstract
The multicellular nature of plants requires that cells should communicate in order to coordinate essential functions. This is achieved in part by molecular flux through pores in the cell wall, called plasmodesmata. We describe the proteomic analysis of plasmodesmata purified from the walls of Arabidopsis suspension cells. Isolated plasmodesmata were seen as membrane-rich structures largely devoid of immunoreactive markers for the plasma membrane, endoplasmic reticulum and cytoplasmic components. Using nano-liquid chromatography and an Orbitrap ion-trap tandem mass spectrometer, 1341 proteins were identified. We refer to this list as the plasmodesmata- or PD-proteome. Relative to other cell wall proteomes, the PD-proteome is depleted in wall proteins and enriched for membrane proteins, but still has a significant number (35%) of putative cytoplasmic contaminants, probably reflecting the sensitivity of the proteomic detection system. To validate the PD-proteome we searched for known plasmodesmal proteins and used molecular and cell biological techniques to identify novel putative plasmodesmal proteins from a small subset of candidates. The PD-proteome contained known plasmodesmal proteins and some inferred plasmodesmal proteins, based upon sequence or functional homology with examples identified in different plant systems. Many of these had a membrane association reflecting the membranous nature of isolated structures. Exploiting this connection we analysed a sample of the abundant receptor-like class of membrane proteins and a small random selection of other membrane proteins for their ability to target plasmodesmata as fluorescently-tagged fusion proteins. From 15 candidates we identified three receptor-like kinases, a tetraspanin and a protein of unknown function as novel potential plasmodesmal proteins. Together with published work, these data suggest that the membranous elements in plasmodesmata may be rich in receptor-like functions, and they validate the content of the PD-proteome as a valuable resource for the further uncovering of the structure and function of plasmodesmata as key components in cell-to-cell communication in plants.
Collapse
Affiliation(s)
| | - Christine Faulkner
- John Innes Centre, Norwich Research Park, Colney, Norwich, United Kingdom
| | - John Walshaw
- John Innes Centre, Norwich Research Park, Colney, Norwich, United Kingdom
| | - Gerhard Saalbach
- John Innes Centre, Norwich Research Park, Colney, Norwich, United Kingdom
| | - Emmanuelle Bayer
- CNRS - Laboratoire de Biogenèse Membranaire, UMR5200, Bordeaux, France
| | | | - Andrew Maule
- John Innes Centre, Norwich Research Park, Colney, Norwich, United Kingdom
- * E-mail:
| |
Collapse
|
19
|
Isokpehi RD, Simmons SS, Cohly HHP, Ekunwe SIN, Begonia GB, Ayensu WK. Identification of drought-responsive universal stress proteins in viridiplantae. Bioinform Biol Insights 2011; 5:41-58. [PMID: 21423406 PMCID: PMC3045048 DOI: 10.4137/bbi.s6061] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Genes encoding proteins that contain the universal stress protein (USP) domain are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses. Specifically in plants, drought tolerance is a desirable phenotype. However, limited focused and organized functional genomic datasets exist on drought-responsive plant USP genes to facilitate their characterization. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Our bioinformatics approach retrieved, mined and integrated comprehensive functional annotation data on 511 protein and 1561 ESTs sequences from 161 viridiplantae taxa. A total of 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum were identified. Two Arabidopsis USP genes At3g62550 and At3g53990 that encode ATP-binding motif were up-regulated in a drought microarray dataset. Further, a dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from bread wheat (BE604157), soybean (BM887317) and maritime pine (BX682209). The SSR sequence types were CAG, ATA and AT respectively. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.
Collapse
|