1
|
The Molecular Network behind Volatile Aroma Formation in Pear (Pyrus spp. Panguxiang) Revealed by Transcriptome Profiling via Fatty Acid Metabolic Pathways. Life (Basel) 2022; 12:life12101494. [PMID: 36294930 PMCID: PMC9605550 DOI: 10.3390/life12101494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/12/2022] [Accepted: 09/21/2022] [Indexed: 11/16/2022] Open
Abstract
Pears are popular table fruits, grown and consumed worldwide for their excellent color, aroma, and taste. Volatile aroma is an important factor affecting fruit quality, and the fatty acid metabolism pathway is important in synthesizing volatile aromas. Most of the white pear varieties cultivated in China are not strongly scented, which significantly affects their overall quality. Panguxiang is a white pear cultivar, but its aroma has unique components and is strong. The study of the mechanisms by which aroma is formed in Panguxiang is, therefore, essential to improving the quality of the fruit. The study analyzed physiological and transcriptome factors to reveal the molecular network behind volatile aroma formation in Panguxiang. The samples of Panguxiang fruit were collected in two (fruit development at 60, 90, 120, and 147 days, and fruit storage at 0, 7, 14, 21, and 28 days) periods. A total of nine sample stages were used for RNA extraction and paired-end sequencing. In addition, RNA quantification and qualification, library preparation and sequencing, data analysis and gene annotation, gene co-expression network analysis, and validation of DEGs through quantitative real-time PCR (qRT-;PCR) were performed in this study. The WGCNA identified yellow functional modules and several biological and metabolic pathways related to fatty acid formation. Finally, we identified seven and eight hub genes in the fatty acid synthesis and fatty acid metabolism pathways, respectively. Further analysis of the co-expression network allowed us to identify several key transcription factors related to the volatile aroma, including AP2/ERF-ERF, C3H, MYB, NAC, C2H2, GRAS, and Trihelix, which may also be involved in the fatty acid synthesis. This study lays a theoretical foundation for studying volatile compounds in pear fruits and provides a theoretical basis for related research in other fruits.
Collapse
|
2
|
Seif Y, Palsson BØ. Path to improving the life cycle and quality of genome-scale models of metabolism. Cell Syst 2021; 12:842-859. [PMID: 34555324 PMCID: PMC8480436 DOI: 10.1016/j.cels.2021.06.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 02/17/2021] [Accepted: 06/23/2021] [Indexed: 11/28/2022]
Abstract
Genome-scale models of metabolism (GEMs) are key computational tools for the systems-level study of metabolic networks. Here, we describe the "GEM life cycle," which we subdivide into four stages: inception, maturation, specialization, and amalgamation. We show how different types of GEM reconstruction workflows fit in each stage and proceed to highlight two fundamental bottlenecks for GEM quality improvement: GEM maturation and content removal. We identify common characteristics contributing to increasing quality of maturing GEMs drawing from past independent GEM maturation efforts. We then shed some much-needed light on the latent and unrecognized but pervasive issue of content removal, demonstrating the substantial effects of model pruning on its solution space. Finally, we propose a novel framework for content removal and associated confidence-level assignment which will help guide future GEM development efforts, reduce duplication of effort across groups, potentially aid automated reconstruction platforms, and boost the reproducibility of model development.
Collapse
Affiliation(s)
- Yara Seif
- Department of Bioengineering, University of California, San Diego, La Jolla, San Diego, CA 92093, USA
| | - Bernhard Ørn Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, San Diego, CA 92093, USA.
| |
Collapse
|
3
|
Asraf SS, Rajnish K, Gunasekaran P. Genomics Perspectives of Bioethanol Producing Zymomonas Mobilis. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
In recent years, there has been continuous increase in demand for fossil fuels that has led to the need for new potential fuel sources. Biofuels, in particular ethanol, are of high interest because of dwindling fossil fuels. Among the ethanol producers, Zymomonas mobilis has acquired greater interest because it is a renewable source of bioethanol. Zymomonas mobilis is an aerotolerant, gram-negative, ethanol producing bacterium that shows high ethanol yield, tolerance, and greater productivity. This chapter focuses on recent efforts made to engineer Z. mobilis, transcriptomic, genome-based metabolomic studies, and bioinformatics exploitation of the available genomic data for the production of bioethanol. Recently, several bioinformatics tools have been used to predict the functional properties of the carbohydrate active ethanologenic enzymes in Z. mobilis. A number of processes were used to study the functional properties of the ethanologenic enzymes of Z. mobilis. Thus, functional genomics seeks to apply technologies that would help to improve the production of bioethanol by Z. mobilis.
Collapse
|
4
|
Snoei J, Urbach H, Engels G, Fassunke J, von Lehe M, Becker AJ, Majores M. Genetic alterations of protein-o-mannosyltransferase-1 in glioneuronal and glial brain tumors with subarachnoid spread. Neuropathology 2008; 29:116-24. [PMID: 18647264 DOI: 10.1111/j.1440-1789.2008.00954.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Leptomeningeal spread is a casual but conspicuous finding in both low- and high-grade gliomas. We hypothesized a compromised integrity of the glia limitans-basal lamina complex due to glycosylation defects by loss of protein-o-mannosyltransferase-1 (POMT1) activity, also a well-known feature in developmental brain disorders with leptomeningeal heterotopia. Hypothesizing it as analogous in gliomas, we have performed a comprehensive polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) analysis of the POMT1 gene in 41 brain tumor specimens. Each specimen was subjected to laser capture microdissection analyses to dissect: (i) subarachnoid tumor components; (ii) deeply localized tumor areas; and (iii) histologically unaffected CNS fragments. In addition, leukocyte DNA of healthy Caucasians served as controls (n = 100). Sequence alterations were found in exons 7, 9, 15 and 18. Exon 7 bore two sequence alterations, one 751C > T transition with amino acid exchange of arginine by tryptophane (Arg251Trp) (n = 12/41 in Tu vs n = 7/82 in Co) and a 752G > A transition with replacement of arginine by glutamine (Arg251Gln) (n = 3/41 in Tu vs n = 0/82 in Co) that were significantly increased in the tumor specimens compared to controls (P < 0.05). A 979G > A transition in exon 9 resulted in a valine to isoleucine switch (Val327Ile) (n = 6/40 in Tu vs n = 4/84 in Co). Individual specimens revealed a 1565G > A (Arg522Lys) transition in exon 15 and a 1922C > T (Ala641Val) transition in exon 18. Two gangliogliomas only revealed sequence alterations in the superficial area but not in intraparenchymal and adjacent control specimens. We conclude that a significant increase of POMT1 missense mutations may indicate a functional role in neoplastic conditions in individual tumors. Future studies will be important to evaluate a functional impact of POMT1 alterations in human brain tumors.
Collapse
Affiliation(s)
- Julia Snoei
- Department of Neuropathology, University of Bonn Medical Center, Bonn, Germany.
| | | | | | | | | | | | | |
Collapse
|
5
|
Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, Obradovic Z. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J Proteome Res 2007; 6:1882-98. [PMID: 17391014 PMCID: PMC2543138 DOI: 10.1021/pr060392u] [Citation(s) in RCA: 422] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Identifying relationships between function, amino acid sequence, and protein structure represents a major challenge. In this study, we propose a bioinformatics approach that identifies functional keywords in the Swiss-Prot database that correlate with intrinsic disorder. A statistical evaluation is employed to rank the significance of these correlations. Protein sequence data redundancy and the relationship between protein length and protein structure were taken into consideration to ensure the quality of the statistical inferences. Over 200,000 proteins from the Swiss-Prot database were analyzed using this approach. The predictions of intrinsic disorder were carried out using PONDR VL3E predictor of long disordered regions that achieves an accuracy of above 86%. Overall, out of the 710 Swiss-Prot functional keywords that were each associated with at least 20 proteins, 238 were found to be strongly positively correlated with predicted long intrinsically disordered regions, whereas 302 were strongly negatively correlated with such regions. The remaining 170 keywords were ambiguous without strong positive or negative correlation with the disorder predictions. These functions cover a large variety of biological activities and imply that disordered regions are characterized by a wide functional repertoire. Our results agree well with literature findings, as we were able to find at least one illustrative example of functional disorder or order shown experimentally for the vast majority of keywords showing the strongest positive or negative correlation with intrinsic disorder. This work opens a series of three papers, which enriches the current view of protein structure-function relationships, especially with regards to functionalities of intrinsically disordered proteins, and provides researchers with a novel tool that could be used to improve the understanding of the relationships between protein structure and function. The first paper of the series describes our statistical approach, outlines the major findings, and provides illustrative examples of biological processes and functions positively and negatively correlated with intrinsic disorder.
Collapse
Affiliation(s)
- Hongbo Xie
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Slobodan Vucetic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Lilia M. Iakoucheva
- Laboratory of Statistical Genetics, The Rockefeller University, New York, NY 10021, USA
| | - Christopher J. Oldfield
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202, USA
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202, USA
| | - Vladimir N. Uversky
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
| | - Zoran Obradovic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
6
|
Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN. Functional anthology of intrinsic disorder. 3. Ligands, post-translational modifications, and diseases associated with intrinsically disordered proteins. J Proteome Res 2007; 6:1917-32. [PMID: 17391016 PMCID: PMC2588348 DOI: 10.1021/pr060394e] [Citation(s) in RCA: 298] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Currently, the understanding of the relationships between function, amino acid sequence, and protein structure continues to represent one of the major challenges of the modern protein science. As many as 50% of eukaryotic proteins are likely to contain functionally important long disordered regions. Many proteins are wholly disordered but still possess numerous biologically important functions. However, the number of experimentally confirmed disordered proteins with known biological functions is substantially smaller than their actual number in nature. Therefore, there is a crucial need for novel bionformatics approaches that allow projection of the current knowledge from a few experimentally verified examples to much larger groups of known and potential proteins. The elaboration of a bioinformatics tool for the analysis of functional diversity of intrinsically disordered proteins and application of this data mining tool to >200 000 proteins from the Swiss-Prot database, each annotated with at least one of the 875 functional keywords, was described in the first paper of this series (Xie, H.; Vucetic, S.; Iakoucheva, L. M.; Oldfield, C. J.; Dunker, A. K.; Obradovic, Z.; Uversky, V.N. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J. Proteome Res. 2007, 5, 1882-1898). Using this tool, we have found that out of the 710 Swiss-Prot functional keywords associated with at least 20 proteins, 262 were strongly positively correlated with long intrinsically disordered regions, and 302 were strongly negatively correlated. Illustrative examples of functional disorder or order were found for the vast majority of keywords showing strongest positive or negative correlation with intrinsic disorder, respectively. Some 80 Swiss-Prot keywords associated with disorder- and order-driven biological processes and protein functions were described in the first paper (see above). The second paper of the series was devoted to the presentation of 87 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes, and coding sequence diversities possessing strong positive and negative correlation with long disordered regions (Vucetic, S.; Xie, H.; Iakoucheva, L. M.; Oldfield, C. J.; Dunker, A. K.; Obradovic, Z.; Uversky, V. N. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. J. Proteome Res. 2007, 5, 1899-1916). Protein structure and functionality can be modulated by various post-translational modifications or/and as a result of binding of specific ligands. Numerous human diseases are associated with protein misfolding/misassembly/misfunctioning. This work concludes the series of papers dedicated to the functional anthology of intrinsic disorder and describes approximately 80 Swiss-Prot functional keywords that are related to ligands, post-translational modifications, and diseases possessing strong positive or negative correlation with the predicted long disordered regions in proteins.
Collapse
Affiliation(s)
- Hongbo Xie
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Slobodan Vucetic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Lilia M. Iakoucheva
- Laboratory of Statistical Genetics, The Rockefeller University, New York, NY 10021
| | - Christopher J. Oldfield
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
| | - Zoran Obradovic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Vladimir N. Uversky
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
- Correspondence should be addressed to: Vladimir N. Uversky, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, MS#4021, Indianapolis, IN 46202, USA; Phone: 317-278-9194; Fax: 317-274-4686; E-mail:
| |
Collapse
|
7
|
Vucetic S, Xie H, Iakoucheva LM, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. J Proteome Res 2007; 6:1899-916. [PMID: 17391015 PMCID: PMC2588346 DOI: 10.1021/pr060393m] [Citation(s) in RCA: 193] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie, H.; Vucetic, S.; Iakoucheva, L. M.; Oldfield, C. J.; Dunker, A. K.; Obradovic, Z.; Uversky, V. N. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J. Proteome Res. 2007, 5, 1882-1898). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes approximately 90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes, and coding sequence diversities possessing strong positive and negative correlation with long disordered regions.
Collapse
Affiliation(s)
- Slobodan Vucetic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Hongbo Xie
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Lilia M. Iakoucheva
- Laboratory of Statistical Genetics, The Rockefeller University, New York, NY 10021
| | - Christopher J. Oldfield
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
| | - Zoran Obradovic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122
| | - Vladimir N. Uversky
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University, School of Medicine, Indianapolis, IN 46202
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
- CORRESPONDING AUTHOR FOOTNOTE: Correspondence should be addressed to: Vladimir N. Uversky, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, MS#4021, Indianapolis, IN 46202, USA; Phone: 317-278-9194; Fax: 317-274-4686; E-mail:
| |
Collapse
|
8
|
O'Donoghue P, Luthey-Schulten Z. Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information. J Mol Biol 2005; 346:875-94. [PMID: 15713469 DOI: 10.1016/j.jmb.2004.11.053] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2004] [Revised: 11/11/2004] [Accepted: 11/17/2004] [Indexed: 11/22/2022]
Abstract
We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.
Collapse
Affiliation(s)
- Patrick O'Donoghue
- Department of Chemistry, University of Illinois at Urbana-Champaign, 600 S. Mathews, Urbana, IL 61801, USA
| | | |
Collapse
|
9
|
Yousef GM, Elliott MB, Kopolovic AD, Serry E, Diamandis EP. Sequence and evolutionary analysis of the human trypsin subfamily of serine peptidases. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2004; 1698:77-86. [PMID: 15063317 DOI: 10.1016/j.bbapap.2003.10.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2003] [Revised: 10/01/2003] [Accepted: 10/27/2003] [Indexed: 10/26/2022]
Abstract
Serine peptidases (SP) are peptidases with a uniquely activated serine residue in the substrate-binding site. SP can be classified into clans with distinct evolutionary histories and each clan further subdivided into families. We analyzed 79 proteins representing the S1A subfamily of human SP, obtained from different databases. Multiple alignment identified 87 highly conserved amino acid residues. In most cases of substitution, a residue of similar character was inserted, implying that the overall character of the local region was conserved. We also identified several conserved protein motifs. 7-13 cysteine positions, potentially forming disulfide bridges, were also found to be conserved. Most members are secreted as inactive (pro) forms with a trypsin-like cleavage site for activation. Substrate specificity was predicted to be trypsin-like for most members, with few chymotrypsin-like proteins. Phylogenetic analysis enabled us to classify members of the S1A subfamily into structurally related groups; this might also help to functionally sort members of this subfamily and give an idea about their possible functions.
Collapse
Affiliation(s)
- George M Yousef
- Department of Pathology and Laboratory Medicine, Division of Clinical Biochemistry, Mount Sinai Hospital, 600 University Avenue, Toronto, ON, Canada M5G 1X5
| | | | | | | | | |
Collapse
|
10
|
Abstract
A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium.
Collapse
Affiliation(s)
- Rolf Apweiler
- The EMBL Outstation-The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
11
|
Abstract
We introduce a metric for local sequence alignments that has utility for accelerating optimal alignment searches without loss of sensitivity. The metric's triangle inequality property permits identification of redundant database entries guaranteed to have optimal alignments to the query sequence that fall below a specified score threshold, thereby permitting comparisons to these entries to be skipped. We prove the existence of the metric for a variety of scoring systems, including the most commonly used ones, and show that a triangle inequality can be established as well for nucleotide-to-protein sequence comparisons. We discuss a database clustering and search strategy that takes advantage of the triangle inequality. The strategy permits moderate but significant acceleration of searches against the widely used "nr" protein database. It also provides a theoretically based method for database clustering in general and provides a standard against which to compare heuristic clustering strategies.
Collapse
Affiliation(s)
- Peter A Spiro
- Incyte Genomics, Inc., 3160 Porter Drive, Palo Alto, CA 94304, USA.
| | | |
Collapse
|
12
|
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003; 31:365-70. [PMID: 12520024 PMCID: PMC165542 DOI: 10.1093/nar/gkg095] [Citation(s) in RCA: 2333] [Impact Index Per Article: 111.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.
Collapse
Affiliation(s)
- Brigitte Boeckmann
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Kriventseva EV, Fleischmann W, Zdobnov EM, Apweiler R. CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res 2001; 29:33-6. [PMID: 11125042 PMCID: PMC29804 DOI: 10.1093/nar/29.1.33] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2000] [Revised: 10/17/2000] [Accepted: 10/17/2000] [Indexed: 11/12/2022] Open
Abstract
The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam and ProDom. Links to the InterPro graphical interface allow users to see at a glance whether proteins from the cluster share particular functional sites. CluSTr also provides cross-references to HSSP and PDB. The database is available for querying and browsing at http://www.ebi.ac.uk/clustr.
Collapse
Affiliation(s)
- E V Kriventseva
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
14
|
Affiliation(s)
- R Apweiler
- EMBL Outstation-The European Bioinformatics Institute, Cambridge, United Kingdom
| |
Collapse
|