1
|
Abstract
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Collapse
|
2
|
MeRy-B: a web knowledgebase for the storage, visualization, analysis and annotation of plant NMR metabolomic profiles. BMC PLANT BIOLOGY 2011; 11:104. [PMID: 21668943 PMCID: PMC3141636 DOI: 10.1186/1471-2229-11-104] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 06/13/2011] [Indexed: 05/13/2023]
Abstract
BACKGROUND Improvements in the techniques for metabolomics analyses and growing interest in metabolomic approaches are resulting in the generation of increasing numbers of metabolomic profiles. Platforms are required for profile management, as a function of experimental design, and for metabolite identification, to facilitate the mining of the corresponding data. Various databases have been created, including organism-specific knowledgebases and analytical technique-specific spectral databases. However, there is currently no platform meeting the requirements for both profile management and metabolite identification for nuclear magnetic resonance (NMR) experiments. DESCRIPTION MeRy-B, the first platform for plant (1)H-NMR metabolomic profiles, is designed (i) to provide a knowledgebase of curated plant profiles and metabolites obtained by NMR, together with the corresponding experimental and analytical metadata, (ii) for queries and visualization of the data, (iii) to discriminate between profiles with spectrum visualization tools and statistical analysis, (iv) to facilitate compound identification. It contains lists of plant metabolites and unknown compounds, with information about experimental conditions, the factors studied and metabolite concentrations for several plant species, compiled from more than one thousand annotated NMR profiles for various organs or tissues. CONCLUSION MeRy-B manages all the data generated by NMR-based plant metabolomics experiments, from description of the biological source to identification of the metabolites and determinations of their concentrations. It is the first database allowing the display and overlay of NMR metabolomic profiles selected through queries on data or metadata. MeRy-B is available from http://www.cbib.u-bordeaux2.fr/MERYB/index.php.
Collapse
|
3
|
Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics 2010; 11:650. [PMID: 21092232 PMCID: PMC3017864 DOI: 10.1186/1471-2164-11-650] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 11/23/2010] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. RESULTS We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. CONCLUSIONS This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.
Collapse
|
4
|
A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics 2010; 11:570. [PMID: 20950475 PMCID: PMC3091719 DOI: 10.1186/1471-2164-11-570] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 10/15/2010] [Indexed: 08/14/2023] Open
Abstract
Background Expressed Sequence Tags (ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL) mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping) to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut). Results A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283) were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.
Collapse
|
5
|
|
6
|
An integrative genomics approach for deciphering the complex interactions between ascorbate metabolism and fruit growth and composition in tomato. C R Biol 2009; 332:1007-21. [PMID: 19909923 DOI: 10.1016/j.crvi.2009.09.013] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Very few reports have studied the interactions between ascorbate and fruit metabolism. In order to get insights into the complex relationships between ascorbate biosynthesis/recycling and other metabolic pathways in the fruit, we undertook a fruit systems biology approach. To this end, we have produced tomato transgenic lines altered in ascorbate content and redox ratio by RNAi-targeting several key enzymes involved in ascorbate biosynthesis (2 enzymes) and recycling (2 enzymes). In the VTC (ViTamin C) Fruit project, we then generated phenotypic and genomic (transcriptome, proteome, metabolome) data from wild type and mutant tomato fruit at two stages of fruit development, and developed or implemented statistical and bioinformatic tools as a web application (named VTC Tool box) necessary to store, analyse and integrate experimental data in tomato. By using Kohonen's self-organizing maps (SOMs) to cluster the biological data, pair-wise Pearson correlation analyses and simultaneous visualization of transcript/protein and metabolites (MapMan), this approach allowed us to uncover major relationships between ascorbate and other metabolic pathways.
Collapse
|
7
|
Life on arginine for Mycoplasma hominis: clues from its minimal genome and comparison with other human urogenital mycoplasmas. PLoS Genet 2009; 5:e1000677. [PMID: 19816563 PMCID: PMC2751442 DOI: 10.1371/journal.pgen.1000677] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2009] [Accepted: 09/07/2009] [Indexed: 12/24/2022] Open
Abstract
Mycoplasma hominis is an opportunistic human mycoplasma. Two other pathogenic human species, M. genitalium and Ureaplasma parvum, reside within the same natural niche as M. hominis: the urogenital tract. These three species have overlapping, but distinct, pathogenic roles. They have minimal genomes and, thus, reduced metabolic capabilities characterized by distinct energy-generating pathways. Analysis of the M. hominis PG21 genome sequence revealed that it is the second smallest genome among self-replicating free living organisms (665,445 bp, 537 coding sequences (CDSs)). Five clusters of genes were predicted to have undergone horizontal gene transfer (HGT) between M. hominis and the phylogenetically distant U. parvum species. We reconstructed M. hominis metabolic pathways from the predicted genes, with particular emphasis on energy-generating pathways. The Embden–Meyerhoff–Parnas pathway was incomplete, with a single enzyme absent. We identified the three proteins constituting the arginine dihydrolase pathway. This pathway was found essential to promote growth in vivo. The predicted presence of dimethylarginine dimethylaminohydrolase suggested that arginine catabolism is more complex than initially described. This enzyme may have been acquired by HGT from non-mollicute bacteria. Comparison of the three minimal mollicute genomes showed that 247 CDSs were common to all three genomes, whereas 220 CDSs were specific to M. hominis, 172 CDSs were specific to M. genitalium, and 280 CDSs were specific to U. parvum. Within these species-specific genes, two major sets of genes could be identified: one including genes involved in various energy-generating pathways, depending on the energy source used (glucose, urea, or arginine) and another involved in cytadherence and virulence. Therefore, a minimal mycoplasma cell, not including cytadherence and virulence-related genes, could be envisaged containing a core genome (247 genes), plus a set of genes required for providing energy. For M. hominis, this set would include 247+9 genes, resulting in a theoretical minimal genome of 256 genes. Mycoplasma hominis, M. genitalium, and Ureaplasma parvum are human pathogenic bacteria that colonize the urogenital tract. They have minimal genomes, and thus have a minimal metabolic capacity. However, they have distinct energy-generating pathways and distinct pathogenic roles. We compared the genomes of these three human pathogen minimal species, providing further insight into the composition of hypothetical minimal gene sets needed for life. To this end, we sequenced the whole M. hominis genome and reconstructed its energy-generating pathways from gene predictions. Its unusual major energy-producing pathway through arginine hydrolysis was confirmed in both genome analyses and in vivo assays. Our findings suggest that M. hominis and U. parvum underwent genetic exchange, probably while sharing a common host. We proposed a set of genes likely to represent a minimal genome. For M. hominis, this minimal genome, not including cytadherence and virulence-related genes, can be defined comprising the 247 genes shared by the three minimal genital mollicutes, combined with a set of nine genes needed for energy production for cell metabolism. This study provides insight for the synthesis of artificial genomes.
Collapse
|
8
|
A community standard format for the representation of protein affinity reagents. Mol Cell Proteomics 2009; 9:1-10. [PMID: 19674966 DOI: 10.1074/mcp.m900185-mcp200] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site.
Collapse
|
9
|
Observing metabolic functions at the genome scale. Genome Biol 2008; 8:R123. [PMID: 17594483 PMCID: PMC2394767 DOI: 10.1186/gb-2007-8-6-r123] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 05/30/2007] [Accepted: 06/26/2007] [Indexed: 01/08/2023] Open
Abstract
A modular approach is presented that allows the observation of the transcriptional activity of metabolic functions at the genome scale. Background High-throughput techniques have multiplied the amount and the types of available biological data, and for the first time achieving a global comprehension of the physiology of biological cells has become an achievable goal. This aim requires the integration of large amounts of heterogeneous data at different scales. It is notably necessary to extend the traditional focus on genomic data towards a truly functional focus, where the activity of cells is described in terms of actual metabolic processes performing the functions necessary for cells to live. Results In this work, we present a new approach for metabolic analysis that allows us to observe the transcriptional activity of metabolic functions at the genome scale. These functions are described in terms of elementary modes, which can be computed in a genome-scale model thanks to a modular approach. We exemplify this new perspective by presenting a detailed analysis of the transcriptional metabolic response of yeast cells to stress. The integration of elementary mode analysis with gene expression data allows us to identify a number of functionally induced or repressed metabolic processes in different stress conditions. The assembly of these elementary modes leads to the identification of specific metabolic backbones. Conclusion This study opens a new framework for the cell-scale analysis of metabolism, where transcriptional activity can be analyzed in terms of whole processes instead of individual genes. We furthermore show that the set of active elementary modes exhibits a highly uneven organization, where most of them conduct specialized tasks while a smaller proportion performs multi-task functions and dominates the general stress response.
Collapse
|
10
|
Large-scale identification of human genes implicated in epidermal barrier function. Genome Biol 2008; 8:R107. [PMID: 17562024 PMCID: PMC2394760 DOI: 10.1186/gb-2007-8-6-r107] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2007] [Revised: 05/24/2007] [Accepted: 06/11/2007] [Indexed: 11/19/2022] Open
Abstract
Identification of genes expressed in epidermal granular keratinocytes by ORESTES, including a number that are highly specific for these cells. Background During epidermal differentiation, keratinocytes progressing through the suprabasal layers undergo complex and tightly regulated biochemical modifications leading to cornification and desquamation. The last living cells, the granular keratinocytes (GKs), produce almost all of the proteins and lipids required for the protective barrier function before their programmed cell death gives rise to corneocytes. We present here the first analysis of the transcriptome of human GKs, purified from healthy epidermis by an original approach. Results Using the ORESTES method, 22,585 expressed sequence tags (ESTs) were produced that matched 3,387 genes. Despite normalization provided by this method (mean 4.6 ORESTES per gene), some highly transcribed genes, including that encoding dermokine, were overrepresented. About 330 expressed genes displayed less than 100 ESTs in UniGene clusters and are most likely to be specific for GKs and potentially involved in barrier function. This hypothesis was tested by comparing the relative expression of 73 genes in the basal and granular layers of epidermis by quantitative RT-PCR. Among these, 33 were identified as new, highly specific markers of GKs, including those encoding a protease, protease inhibitors and proteins involved in lipid metabolism and transport. We identified filaggrin 2 (also called ifapsoriasin), a poorly characterized member of the epidermal differentiation complex, as well as three new lipase genes clustered with paralogous genes on chromosome 10q23.31. A new gene of unknown function, C1orf81, is specifically disrupted in the human genome by a frameshift mutation. Conclusion These data increase the present knowledge of genes responsible for the formation of the skin barrier and suggest new candidates for genodermatoses of unknown origin.
Collapse
|
11
|
Being pathogenic, plastic, and sexual while living with a nearly minimal bacterial genome. PLoS Genet 2007; 3:e75. [PMID: 17511520 PMCID: PMC1868952 DOI: 10.1371/journal.pgen.0030075] [Citation(s) in RCA: 139] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2006] [Accepted: 04/02/2007] [Indexed: 11/18/2022] Open
Abstract
Mycoplasmas are commonly described as the simplest self-replicating organisms, whose evolution was mainly characterized by genome downsizing with a proposed evolutionary scenario similar to that of obligate intracellular bacteria such as insect endosymbionts. Thus far, analysis of mycoplasma genomes indicates a low level of horizontal gene transfer (HGT) implying that DNA acquisition is strongly limited in these minimal bacteria. In this study, the genome of the ruminant pathogen Mycoplasma agalactiae was sequenced. Comparative genomic data and phylogenetic tree reconstruction revealed that ∼18% of its small genome (877,438 bp) has undergone HGT with the phylogenetically distinct mycoides cluster, which is composed of significant ruminant pathogens. HGT involves genes often found as clusters, several of which encode lipoproteins that usually play an important role in mycoplasma–host interaction. A decayed form of a conjugative element also described in a member of the mycoides cluster was found in the M. agalactiae genome, suggesting that HGT may have occurred by mobilizing a related genetic element. The possibility of HGT events among other mycoplasmas was evaluated with the available sequenced genomes. Our data indicate marginal levels of HGT among Mycoplasma species except for those described above and, to a lesser extent, for those observed in between the two bird pathogens, M. gallisepticum and M. synoviae. This first description of large-scale HGT among mycoplasmas sharing the same ecological niche challenges the generally accepted evolutionary scenario in which gene loss is the main driving force of mycoplasma evolution. The latter clearly differs from that of other bacteria with small genomes, particularly obligate intracellular bacteria that are isolated within host cells. Consequently, mycoplasmas are not only able to subvert complex hosts but presumably have retained sexual competence, a trait that may prevent them from genome stasis and contribute to adaptation to new hosts. Mycoplasmas are cell wall–lacking prokaryotes that evolved from ancestors common to Gram-positive bacteria by way of massive losses of genetic material. With their minimal genome, mycoplasmas are considered to be the simplest free-living organisms, yet several species are successful pathogens of man and animal. In this study, we challenged the commonly accepted view in which mycoplasma evolution is driven only by genome down-sizing. Indeed, we showed that a significant amount of genes underwent horizontal transfer among different mycoplasma species that share the same ruminant hosts. In these species, the occurrence of a genetic element that can promote DNA transfer via cell-to-cell contact suggests that some mycoplasmas may have retained or acquired sexual competence. Transferred genes were found to encode proteins that are likely to be associated with mycoplasma–host interactions. Sharing genetic resources via horizontal gene transfer may provide mycoplasmas with a means for adapting to new niches or to new hosts and for avoiding irreversible genome erosion.
Collapse
|
12
|
Mapping the proteome of poplar and application to the discovery of drought-stress responsive proteins. Proteomics 2007; 6:6509-27. [PMID: 17163438 DOI: 10.1002/pmic.200600362] [Citation(s) in RCA: 142] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Poplar is the first forest tree genome to be decoded. As an initial step to the comprehensive analysis of poplar proteome, we described reference 2-D-maps for eight tissues/organs of the plant, and the functional characterization of some proteins. A total of 398 proteins were excised from the gels. About 91.2% were identified by nanospray LC-MS/MS, based on comparison with 260,000 Populus sp. ESTs. In comparison, reliable PMFs were obtained for only 51% of the spots by MALDI-TOF-MS, from which 43% (83 spots) positively matched gene models of the Populus trichocarpa genome sequence. Among these 83 spots, 58% matched with the same proteins as identified by LC-MS/MS, 21.7% with unknown function proteins and 19.3% with completely different functions. In the second phase, we studied the effect of drought stress on poplar root and leaf proteomes. The function of up- and down-regulated proteins is discussed with respect to the physiological response of the plants and compared with transcriptomic data. Some important clues regarding the way poplar copes with water deficit were revealed.
Collapse
|
13
|
ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome. Nat Methods 2007; 4:13-7. [PMID: 17195019 DOI: 10.1038/nmeth0107-13] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
ProteomeBinders is a new European consortium aiming to establish a comprehensive resource of well-characterized affinity reagents, including but not limited to antibodies, for analysis of the human proteome. Given the huge diversity of the proteome, the scale of the project is potentially immense but nevertheless feasible in the context of a pan-European or even worldwide coordination.
Collapse
|
14
|
Abstract
Wood is one of our most important natural resources. Surprisingly, we know hardly anything about the details of the process of wood formation. The aim of this work was to describe the main proteins expressed in wood forming tissue of a conifer species (Pinus pinaster Ait.). Using high resolution 2-DE with linear pH gradient ranging from 4 to 7, a total of 1039 spots were detected. Out of the 240 spots analyzed by MS/MS, 67.9% were identified, 16.7% presented no homology in the databases, and 15.4% corresponded to protein mixtures. Out of the 57 spots analyzed by MALDI-MS, only 15.8% were identified. Most of the 175 identified proteins play a role in either defense (19.4%), carbohydrates (16.6%) and amino acid (14.9%) metabolisms, genes and proteins expression (13.1%), cytoskeleton (8%), cell wall biosynthesis (5.7%), secondary (5.1%) and primary (4%) metabolisms. A summary of the identified proteins, their putative functions, and behavior in different types of wood are presented. This information was introduced into the PROTICdb database and is accessible at http://cbib1.cbib.u-bordeaux2.fr/Protic/Protic/home/index.php. Finally, the average protein amount was compared with their respective transcript abundance as quantified through EST counting in a cDNA-library constructed with mRNA extracted from wood forming tissue.
Collapse
|
15
|
Abstract
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
Collapse
|
16
|
New strategy for the representation and the integration of biomolecular knowledge at a cellular scale. Nucleic Acids Res 2004; 32:3581-9. [PMID: 15240831 PMCID: PMC484170 DOI: 10.1093/nar/gkh681] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The combination of sequencing and post-sequencing experimental approaches produces huge collections of data that are highly heterogeneous both in structure and in semantics. We propose a new strategy for the integration of such data. This strategy uses structured sets of sequences as a unified representation of biological information and defines a probabilistic measure of similarity between the sets. Sets can be composed of sequences that are known to have a biological relationship (e.g. proteins involved in a complex or a pathway) or that share similar values for a particular attribute (e.g. expression profile). We have developed a software, BlastSets, which implements this strategy. It exploits a database where the sets derived from diverse biological information can be deposited using a standard XML format. For a given query set, BlastSets returns target sets found in the database whose similarity to the query is statistically significant. The tool allowed us to automatically identify verified relationships between correlated expression profiles and biological pathways using publicly available data for Saccharomyces cerevisiae. It was also used to retrieve the members of a complex (ribosome) based on the mining of expression profiles. These first results validate the relevance of the strategy and demonstrate the promising potential of BlastSets.
Collapse
|
17
|
Abstract
Bacteria belonging to the class Mollicutes were among the first ones to be selected for complete genome sequencing because of the minimal size of their genomes and their pathogenicity for humans and a broad range of animals and plants. At this time six genome sequences have been publicly released (Mycoplasma genitalium, Mycoplasma pneumoniae, Ureaplasma urealyticum-parvum, Mycoplasma pulmonis, Mycoplasma penetrans and Mycoplasma gallisepticum) and as the number of available mollicute genomes increases, comparative genomics analysis within this model group of organisms becomes more and more instructive. However, such an analysis is difficult to carry out without a suitable platform gathering not only the original annotations but also relevant information available in public databases or obtained by applying common bioinformatics methods. With the aim of solving these difficulties, we have developed a web-accessible database named MolliGen (http://cbi.labri.fr/outils/molligen/). After selecting a set of genomes the user can launch various types of search based on annotation, position on the chromosomes or sequence similarity. In addition, relationships of putative orthology have been precomputed to allow differential genome queries. The results are presented in table format with multiple links to public databases and to bioinformatic analyses such as multiple alignments or BLAST search. Specific tools were also developed for the graphical visualization of the results, including a multi- genome browser for displaying dynamic pictures with clickable objects and for viewing relationships of precomputed similarity. MolliGen is designed to integrate all the complete genomes of mollicutes as they become available.
Collapse
|
18
|
Abstract
SUMMARY IPPRED is a web based server to infer protein-protein interactions through homology search between candidate proteins and those described as interacting. This simple inference allows to propose or to validate potential interactions. AVAILABILITY IPPRED is freely available at http://cbi.labri.fr/outils/ippred/.
Collapse
|
19
|
Analysis of the cellular functions of Escherichia coli operons and their conservation in Bacillus subtilis. J Mol Evol 2002; 55:211-21. [PMID: 12107597 DOI: 10.1007/s00239-002-2317-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2000] [Accepted: 11/05/2001] [Indexed: 10/26/2022]
Abstract
The common assumption of operons as composed of genes that cooperate in a biological process is confirmed here by showing that Escherichia coli operons tend to be composed of genes that belong to the same general class of cellular function. Furthermore, the comparison between the genomic organization of E. coli and that of Bacillus subtilis shows that the genes that are homologous to genes that belong to experimentally characterized E. coli operons tend to cluster in neighboring regions of the genome. This tendency is greater for the subset of E. coli operons whose genes belong to a single functional class. These observations indicate strong evolutionary pressure that, translated into functional constraints, leads to the inclusion of many essential functions in conserved operons and clusters in these two distant species.
Collapse
|
20
|
Conference Report: The ESF Programme on Integrated Approaches for Functional Genomics. Workshop on 'Data Integration in Functional Genomics and Proteomics'. Comp Funct Genomics 2002; 3:16-21. [PMID: 18628884 PMCID: PMC2447244 DOI: 10.1002/cfg.134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
21
|
RIBDB: An SRS Based Infrastructure for REALIS. Comp Funct Genomics 2002; 3:35-6. [PMID: 18628878 PMCID: PMC2447238 DOI: 10.1002/cfg.139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2001] [Accepted: 12/06/2001] [Indexed: 11/11/2022] Open
Abstract
The REALIS project is an EU-funded consortium for the post genomic analysis of the food
pathogen Listeria monocytogenes. The data generated by the consortium members is
stored under the RIBDB database, a system built using SRS which integrates consortium
data, public databases, and applications for analysis. RIBDB is available to all consortium
members through a web server, with the option of installing a local mirror of the main
server for local analysis.
Collapse
|