1
|
Pelletier G. Michel Caboche, an outstanding plant molecular and cell biologist. C R Biol 2021; 344:209-218. [DOI: 10.5802/crbiol.57] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 06/21/2021] [Indexed: 11/24/2022]
|
2
|
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 2016; 1374:23-54. [PMID: 26519399 DOI: 10.1007/978-1-4939-3167-5_2] [Citation(s) in RCA: 450] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
Collapse
Affiliation(s)
- Emmanuel Boutet
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland.
| | - Damien Lieberherr
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Michael Tognolli
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Michel Schneider
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Parit Bansal
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Alan J Bridge
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Sylvain Poux
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Lydie Bougueleret
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
| | - Ioannis Xenarios
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, rue Michel Servet 1, CH-1211, Geneva 4, Switzerland
- University of Lausanne, CIG, Lausanne, 1015, Switzerland
| |
Collapse
|
3
|
Malhotra S, Sowdhamini R. Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions. Nucleic Acids Res 2013; 41:7212-9. [PMID: 23775796 PMCID: PMC3753632 DOI: 10.1093/nar/gkt505] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | |
Collapse
|
4
|
Fawal N, Li Q, Savelli B, Brette M, Passaia G, Fabre M, Mathé C, Dunand C. PeroxiBase: a database for large-scale evolutionary analysis of peroxidases. Nucleic Acids Res 2012. [PMID: 23180785 PMCID: PMC3531118 DOI: 10.1093/nar/gks1083] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The PeroxiBase (http://peroxibase.toulouse.inra.fr/) is a specialized database devoted to peroxidases' families, which are major actors of stress responses. In addition to the increasing number of sequences and the complete modification of the Web interface, new analysis tools and functionalities have been developed since the previous publication in the NAR database issue. Nucleotide sequences and graphical representation of the gene structure can now be included for entries containing genomic cross-references. An expert semi-automatic annotation strategy is being developed to generate new entries from genomic sequences and from EST libraries. Plus, new internal and automatic controls have been included to improve the quality of the entries. To compare gene structure organization among families' members, two new tools are available, CIWOG to detect common introns and GECA to visualize gene structure overlaid with sequence conservation. The multicriteria search tool was greatly improved to allow simple and combined queries. After such requests or a BLAST search, different analysis processes are suggested, such as multiple alignments with ClustalW or MAFFT, a platform for phylogenetic analysis and GECA's display in association with a phylogenetic tree. Finally, we updated our family specific profiles implemented in the PeroxiScan tool and made new profiles to consider new sub-families.
Collapse
Affiliation(s)
- Nizar Fawal
- Université de Toulouse, UPS, UMR 5546, Laboratoire de Recherche en Sciences Végétales, France
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Dèrozier S, Samson F, Tamby JP, Guichard C, Brunaud V, Grevet P, Gagnot S, Label P, Leplé JC, Lecharny A, Aubourg S. Exploration of plant genomes in the FLAGdb++ environment. PLANT METHODS 2011; 7:8. [PMID: 21447150 PMCID: PMC3073958 DOI: 10.1186/1746-4811-7-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 03/29/2011] [Indexed: 05/04/2023]
Abstract
BACKGROUND In the contexts of genomics, post-genomics and systems biology approaches, data integration presents a major concern. Databases provide crucial solutions: they store, organize and allow information to be queried, they enhance the visibility of newly produced data by comparing them with previously published results, and facilitate the exploration and development of both existing hypotheses and new ideas. RESULTS The FLAGdb++ information system was developed with the aim of using whole plant genomes as physical references in order to gather and merge available genomic data from in silico or experimental approaches. Available through a JAVA application, original interfaces and tools assist the functional study of plant genes by considering them in their specific context: chromosome, gene family, orthology group, co-expression cluster and functional network. FLAGdb++ is mainly dedicated to the exploration of large gene groups in order to decipher functional connections, to highlight shared or specific structural or functional features, and to facilitate translational tasks between plant species (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera). CONCLUSION Combining original data with the output of experts and graphical displays that differ from classical plant genome browsers, FLAGdb++ presents a powerful complementary tool for exploring plant genomes and exploiting structural and functional resources, without the need for computer programming knowledge. First launched in 2002, a 15th version of FLAGdb++ is now available and comprises four model plant genomes and over eight million genomic features.
Collapse
Affiliation(s)
- Sandra Dèrozier
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
- Unité Mathématique Informatique et Génome (MIG), UR INRA 1077, Domaine de Vilvert, F-78352 Jouy-en-Josas Cedex, France
| | - Franck Samson
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
- Unité Mathématique Informatique et Génome (MIG), UR INRA 1077, Domaine de Vilvert, F-78352 Jouy-en-Josas Cedex, France
| | - Jean-Philippe Tamby
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| | - Cécile Guichard
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| | - Véronique Brunaud
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| | - Philippe Grevet
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| | - Séverine Gagnot
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
- Laboratoire de Chimie Bactérienne (LCB), UPR CNRS 9043 - IFR 88, 31 Chemin Joseph Aiguier, F-13009 Marseille, France
| | - Philippe Label
- Unité Amélioration, Génétique et Physiologie Forestières (UAGPF), UR INRA 588, 2163 avenue de la Pomme de Pin, CS 4001 Ardon, F-45075 Orléans, France
| | - Jean-Charles Leplé
- Unité Amélioration, Génétique et Physiologie Forestières (UAGPF), UR INRA 588, 2163 avenue de la Pomme de Pin, CS 4001 Ardon, F-45075 Orléans, France
| | - Alain Lecharny
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| | - Sébastien Aubourg
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - Université d'Evry Val d'Essonne - ERL CNRS 8196, 2 Rue Gaston Crémieux, CP 5708, F-91057 Evry Cedex, France
| |
Collapse
|
6
|
Omelyanchuk NA, Mironova VV, Kolchanov NA. Plant developmental genetics: Integrating data from different experiments in databases. RUSS J GENET+ 2009. [DOI: 10.1134/s1022795409110052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Abstract
The accurate identification of exons and introns that comprise a complete plant gene structure can be a time-consuming and challenging task. Novel Web-based tools facilitate the process by providing a convenient interface to current transcript evidence, and portals to relevant bioinformatics software. With a few keystrokes, the user can explore alternative transcript assemblies and, for example, select for annotation those that are clearly supported by transcript evidence and similarity to known genes. The implementation of the tool at the PlantGDB resource also allows immediate communication of the novel annotations to the community through Web display.
Collapse
Affiliation(s)
- Volker Brendel
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
8
|
Baker EJ, Lin GN, Liu H, Kosuri R. NFU-Enabled FASTA: moving bioinformatics applications onto wide area networks. SOURCE CODE FOR BIOLOGY AND MEDICINE 2007; 2:8. [PMID: 18039379 PMCID: PMC2211279 DOI: 10.1186/1751-0473-2-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2007] [Accepted: 11/26/2007] [Indexed: 11/24/2022]
Abstract
Background Advances in Internet technologies have allowed life science researchers to reach beyond the lab-centric research paradigm to create distributed collaborations. Of the existing technologies that support distributed collaborations, there are currently none that simultaneously support data storage and computation as a shared network resource, enabling computational burden to be wholly removed from participating clients. Software using computation-enable logistical networking components of the Internet Backplane Protocol provides a suitable means to accomplish these tasks. Here, we demonstrate software that enables this approach by distributing both the FASTA algorithm and appropriate data sets within the framework of a wide area network. Results For large datasets, computation-enabled logistical networks provide a significant reduction in FASTA algorithm running time over local and non-distributed logistical networking frameworks. We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks. Conclusion Network function unit-enabled Internet Backplane Protocol effectively distributes FASTA algorithm computation over large data sets stored within the scaleable network. In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science. In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network.
Collapse
|
9
|
Abstract
The Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI), and the Protein Information Resource (PIR) form the Universal Protein Resource (UniProt) consortium. Its main goal is to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB) and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc). (1) UniProtKB is a comprehensive protein sequence knowledgebase that consists of two sections: UniProtKB/Swiss-Prot, which contains manually annotated entries, and UniProtKB/TrEMBL, which contains computer-annotated entries. UniProtKB/Swiss-Prot entries contain information curated by biologists and provide users with cross-links to about 100 external databases and with access to additional information or tools. (2) The UniRef databases (UniRef100, UniRef90, and UniRef50) define clusters of protein sequences that share 100, 90, or 50% identity. (3) The UniParc database stores and maps all publicly available protein sequence data, including obsolete data excluded from UniProtKB. The UniProt databases can be accessed online (http://www.uniprot.org/) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every 2 weeks. The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry, paying particular attention to the specificities of plant protein annotation. We will also present some of the tools and databases that are linked to each entry.
Collapse
Affiliation(s)
- Emmanuel Boutet
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | | | | | | | | |
Collapse
|
10
|
Wise RP, Moscou MJ, Bogdanove AJ, Whitham SA. Transcript profiling in host-pathogen interactions. ANNUAL REVIEW OF PHYTOPATHOLOGY 2007; 45:329-69. [PMID: 17480183 DOI: 10.1146/annurev.phyto.45.011107.143944] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Using genomic technologies, it is now possible to address research hypotheses in the context of entire developmental or biochemical pathways, gene networks, and chromosomal location of relevant genes and their inferred evolutionary history. Through a range of platforms, researchers can survey an entire transcriptome under a variety of experimental and field conditions. Interpretation of such data has led to new insights and revealed previously undescribed phenomena. In the area of plant-pathogen interactions, transcript profiling has provided unparalleled perception into the mechanisms underlying gene-for-gene resistance and basal defense, host vs nonhost resistance, biotrophy vs necrotrophy, and pathogenicity of vascular vs nonvascular pathogens, among many others. In this way, genomic technologies have facilitated a system-wide approach to unifying themes and unique features in the interactions of hosts and pathogens.
Collapse
Affiliation(s)
- Roger P Wise
- Corn Insects and Crop Genetics Research, USDA-ARS, Iowa State University, Ames, Iowa 50011-1020, USA.
| | | | | | | |
Collapse
|
11
|
Rivals E, Bruyère C, Toffano-Nioche C, Lecharny A. Formation of the Arabidopsis pentatricopeptide repeat family. PLANT PHYSIOLOGY 2006; 141:825-39. [PMID: 16825340 PMCID: PMC1489915 DOI: 10.1104/pp.106.077826] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
In Arabidopsis (Arabidopsis thaliana) the 466 pentatricopeptide repeat (PPR) proteins are putative RNA-binding proteins with essential roles in organelles. Roughly half of the PPR proteins form the plant combinatorial and modular protein (PCMP) subfamily, which is land-plant specific. PCMPs exhibit a large and variable tandem repeat of a standard pattern of three PPR variant motifs. The association or not of this repeat with three non-PPR motifs at their C terminus defines four distinct classes of PCMPs. The highly structured arrangement of these motifs and the similar repartition of these arrangements in the four classes suggest precise relationships between motif organization and substrate specificity. This study is an attempt to reconstruct an evolutionary scenario of the PCMP family. We developed an innovative approach based on comparisons of the proteins at two levels: namely the succession of motifs along the protein and the amino acid sequence of the motifs. It enabled us to infer evolutionary relationships between proteins as well as between the inter- and intraprotein repeats. First, we observed a polarized elongation of the repeat from the C terminus toward the N-terminal region, suggesting local recombinations of motifs. Second, the most N-terminal PPR triple motif proved to evolve under different constraints than the remaining repeat. Altogether, the evidence indicates different evolution for the PPR region and the C-terminal one in PCMPs, which points to distinct functions for these regions. Moreover, local sequence homogeneity observed across PCMP classes may be due to interclass shuffling of motifs, or to deletions/insertions of non-PPR motifs at the C terminus.
Collapse
Affiliation(s)
- Eric Rivals
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5506, Université de Montpellier II, 34392 Montpellier cedex 5, France
| | | | | | | |
Collapse
|
12
|
Omelyanchuk NA, Mironova VV, Zalevsky EM, Shamov IS, Poplavsky AS, Podkolodny NL, Ponomaryov DK, Nikolaev SV, Mjolsness ED, Meyerowitz EM, Kolchanov NA. A systems approach to morphogenesis in Arabidopsis thaliana: I. AGNS database. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906070165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
13
|
Schneider M, Bairoch A, Wu CH, Apweiler R. Plant protein annotation in the UniProt Knowledgebase. PLANT PHYSIOLOGY 2005; 138:59-66. [PMID: 15888679 PMCID: PMC1104161 DOI: 10.1104/pp.104.058933] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The Swiss-Prot, TrEMBL, Protein Information Resource (PIR), and DNA Data Bank of Japan (DDBJ) protein database activities have united to form the Universal Protein Resource (UniProt) Consortium. UniProt presents three database layers: the UniProt Archive, the UniProt Knowledgebase (UniProtKB), and the UniProt Reference Clusters. The UniProtKB consists of two sections: UniProtKB/Swiss-Prot (fully manually curated entries) and UniProtKB/TrEMBL (automated annotation, classification and extensive cross-references). New releases are published fortnightly. A specific Plant Proteome Annotation Program (http://www.expasy.org/sprot/ppap/) was initiated to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Through UniProt, our aim is to provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information that will allow the plant community to fully explore and utilize the wealth of information available for both plant and non-plant model organisms.
Collapse
Affiliation(s)
- Michel Schneider
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, University of Geneva, 1211 Geneva 4, Switzerland.
| | | | | | | |
Collapse
|