1
|
Gong P. Dynamic integration of biological data sources using the data concierge. Health Inf Sci Syst 2013; 1:7. [PMID: 25825659 PMCID: PMC4340781 DOI: 10.1186/2047-2501-1-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 09/26/2012] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The ever-changing landscape of large-scale network environments and innovative biology technologies require dynamic mechanisms to rapidly integrate previously unknown bioinformatics sources at runtime. However, existing integration technologies lack sufficient flexibility to adapt to these changes, because the techniques used for integration are static, and sensitive to new or changing bioinformatics source implementations and evolutionary biologist requirements. METHODS To address this challenge, in this paper we propose a new semantics-based adaptive middleware, the Data Concierge, which is able to dynamically integrate heterogeneous biological data sources without the need for wrappers. Along with the architecture necessary to facilitate dynamic integration, API description mechanism is proposed to dynamically classify, recognize, locate, and invoke newly added biological data source functionalities. Based on the unified semantic metadata, XML-based state machines are able to provide flexible configurations to execute biologist's abstract and complex operations. RESULTS AND DISCUSSION Experimental results demonstrate that for obtaining dynamic features, the Data Concierge sacrifices reasonable performance on reasoning knowledge models and dynamically doing data source API invocations. The overall costs to integrate new biological data sources are significantly lower when using the Data Concierge. CONCLUSIONS The Data Concierge facilitates the rapid integration of new biological data sources in existing applications with no repetitive software development required, and hence, this mechanism would provide a cost-effective solution to the labor-intensive software engineering tasks.
Collapse
Affiliation(s)
- Peng Gong
- Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information Technologies, the University of Sydney, Sydney, NSW 2006 Australia
- Department of PET and Nuclear Medicine, RPA Hospital, Camperdown, NSW 2050 Australia
| |
Collapse
|
2
|
Sawyeria marylandensis (Heterolobosea) has a hydrogenosome with novel metabolic properties. EUKARYOTIC CELL 2010; 9:1913-24. [PMID: 21037180 DOI: 10.1128/ec.00122-10] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Protists that live under low-oxygen conditions often lack conventional mitochondria and instead possess mitochondrion-related organelles (MROs) with distinct biochemical functions. Studies of mostly parasitic organisms have suggested that these organelles could be classified into two general types: hydrogenosomes and mitosomes. Hydrogenosomes, found in parabasalids, anaerobic chytrid fungi, and ciliates, metabolize pyruvate anaerobically to generate ATP, acetate, CO(2), and hydrogen gas, employing enzymes not typically associated with mitochondria. Mitosomes that have been studied have no apparent role in energy metabolism. Recent investigations of free-living anaerobic protists have revealed a diversity of MROs with a wider array of metabolic properties that defy a simple functional classification. Here we describe an expressed sequence tag (EST) survey and ultrastructural investigation of the anaerobic heteroloboseid amoeba Sawyeria marylandensis aimed at understanding the properties of its MROs. This organism expresses typical anaerobic energy metabolic enzymes, such as pyruvate:ferredoxin oxidoreductase, [FeFe]-hydrogenase, and associated hydrogenase maturases with apparent organelle-targeting peptides, indicating that its MRO likely functions as a hydrogenosome. We also identified 38 genes encoding canonical mitochondrial proteins in S. marylandensis, many of which possess putative targeting peptides and are phylogenetically related to putative mitochondrial proteins of its heteroloboseid relative Naegleria gruberi. Several of these proteins, such as a branched-chain alpha keto acid dehydrogenase, likely function in pathways that have not been previously associated with the well-studied hydrogenosomes of parabasalids. Finally, morphological reconstructions based on transmission electron microscopy indicate that the S. marylandensis MROs form novel cup-like structures within the cells. Overall, these data suggest that Sawyeria marylandensis possesses a hydrogenosome of mitochondrial origin with a novel combination of biochemical and structural properties.
Collapse
|
3
|
Ramírez S, Muñoz-Mérida A, Karlsson J, García M, Pérez-Pulido AJ, Claros MG, Trelles O. MOWServ: a web client for integration of bioinformatic resources. Nucleic Acids Res 2010; 38:W671-6. [PMID: 20525794 PMCID: PMC2896175 DOI: 10.1093/nar/gkq497] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
Collapse
Affiliation(s)
- Sergio Ramírez
- Departamento Arquitectura de Computadores, Escuela Técnica Superior de Ingeniería Informática, Universidad de Málaga, Málaga, Spain
| | | | | | | | | | | | | |
Collapse
|
4
|
Dinov ID, Rubin D, Lorensen W, Dugan J, Ma J, Murphy S, Kirschner B, Bug W, Sherman M, Floratos A, Kennedy D, Jagadish HV, Schmidt J, Athey B, Califano A, Musen M, Altman R, Kikinis R, Kohane I, Delp S, Parker DS, Toga AW. iTools: a framework for classification, categorization and integration of computational biology resources. PLoS One 2008; 3:e2265. [PMID: 18509477 PMCID: PMC2386255 DOI: 10.1371/journal.pone.0002265] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2008] [Accepted: 03/27/2008] [Indexed: 11/22/2022] Open
Abstract
The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
Collapse
Affiliation(s)
- Ivo D. Dinov
- Center for Computational Biology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Daniel Rubin
- National Center for Biomedical Ontology, Stanford University, Stanford, California, United States of America
| | - William Lorensen
- National Alliance for Medical Imaging Computing, Harvard University, Cambridge, Massachusetts, United States of America
| | - Jonathan Dugan
- Center for Physics-based Simulation of Biological Structures, Stanford University, Stanford, California, United States of America
| | - Jeff Ma
- Center for Computational Biology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Shawn Murphy
- Informatics for Integrating Biology and the Bedside, Harvard University, Cambridge, Massachusetts, United States of America
| | - Beth Kirschner
- National Center for Integrative Biomedical Informatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - William Bug
- National Center for Microscopy Imaging Research, University of California San Diego, San Diego, California, United States of America
| | - Michael Sherman
- Center for Physics-based Simulation of Biological Structures, Stanford University, Stanford, California, United States of America
| | - Aris Floratos
- National Center for Multi-Scale Study of Cellular Networks, Columbia University, New York, New York, United States of America
| | - David Kennedy
- Neuroscience Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - H. V. Jagadish
- National Center for Integrative Biomedical Informatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeanette Schmidt
- Center for Physics-based Simulation of Biological Structures, Stanford University, Stanford, California, United States of America
| | - Brian Athey
- National Center for Integrative Biomedical Informatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Andrea Califano
- National Center for Multi-Scale Study of Cellular Networks, Columbia University, New York, New York, United States of America
| | - Mark Musen
- National Center for Biomedical Ontology, Stanford University, Stanford, California, United States of America
| | - Russ Altman
- Center for Physics-based Simulation of Biological Structures, Stanford University, Stanford, California, United States of America
| | - Ron Kikinis
- National Alliance for Medical Imaging Computing, Harvard University, Cambridge, Massachusetts, United States of America
| | - Isaac Kohane
- Informatics for Integrating Biology and the Bedside, Harvard University, Cambridge, Massachusetts, United States of America
| | - Scott Delp
- Center for Physics-based Simulation of Biological Structures, Stanford University, Stanford, California, United States of America
| | - D. Stott Parker
- Center for Computational Biology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Arthur W. Toga
- Center for Computational Biology, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
5
|
Ruiz-Trillo I, Roger AJ, Burger G, Gray MW, Lang BF. A phylogenomic investigation into the origin of metazoa. Mol Biol Evol 2008; 25:664-72. [PMID: 18184723 DOI: 10.1093/molbev/msn006] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The evolution of multicellular animals (Metazoa) from their unicellular ancestors was a key transition that was accompanied by the emergence and diversification of gene families associated with multicellularity. To clarify the timing and order of specific events in this transition, we conducted expressed sequence tag surveys on 4 putative protistan relatives of Metazoa including the choanoflagellate Monosiga ovata, the ichthyosporeans Sphaeroforma arctica and Amoebidium parasiticum, and the amoeba Capsaspora owczarzaki, and 2 members of Amoebozoa, Acanthamoeba castellanii and Mastigamoeba balamuthi. We find that homologs of genes involved in metazoan multicellularity exist in several of these unicellular organisms, including 1 encoding a membrane-associated guanylate kinase with an inverted arrangement of protein-protein interaction domains (MAGI) in Capsaspora. In Metazoa, MAGI regulates tight junctions involved in cell-cell communication. By phylogenomic analyses of genes encoded in nuclear and mitochondrial genomes, we show that the choanoflagellates are the closest relatives of the Metazoa, followed by the Capsaspora and Ichthyosporea lineages, although the branching order between the latter 2 groups remains unclear. Understanding the function of "metazoan-specific" proteins we have identified in these protists will clarify the evolutionary steps that led to the emergence of the Metazoa.
Collapse
Affiliation(s)
- Iñaki Ruiz-Trillo
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada.
| | | | | | | | | |
Collapse
|
6
|
Shen YQ, Burger G. 'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics 2007; 8:420. [PMID: 17967180 PMCID: PMC2176073 DOI: 10.1186/1471-2105-8-420] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 10/29/2007] [Indexed: 12/02/2022] Open
Abstract
Background Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. Results In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. Conclusion We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File 2.
Collapse
Affiliation(s)
- Yao Qing Shen
- Robert Cedergren Center for Bioinformatics and Genomics, Biochemistry Department, Université de Montréal, 2900 Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada.
| | | |
Collapse
|
7
|
Koziol AG, Borza T, Ishida KI, Keeling P, Lee RW, Durnford DG. Tracing the evolution of the light-harvesting antennae in chlorophyll a/b-containing organisms. PLANT PHYSIOLOGY 2007; 143:1802-16. [PMID: 17307901 PMCID: PMC1851817 DOI: 10.1104/pp.106.092536] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
The light-harvesting complexes (LHCs) of land plants and green algae have essential roles in light capture and photoprotection. Though the functional diversity of the individual LHC proteins are well described in many land plants, the extent of this family in the majority of green algal groups is unknown. To examine the evolution of the chlorophyll a/b antennae system and to infer its ancestral state, we initiated several expressed sequence tag projects from a taxonomically broad range of chlorophyll a/b-containing protists. This included representatives from the Ulvophyceae (Acetabularia acetabulum), the Mesostigmatophyceae (Mesostigma viride), and the Prasinophyceae (Micromonas sp.), as well as one representative from each of the Euglenozoa (Euglena gracilis) and Chlorarachniophyta (Bigelowiella natans), whose plastids evolved secondarily from a green alga. It is clear that the core antenna system was well developed prior to green algal diversification and likely consisted of the CP29 (Lhcb4) and CP26 (Lhcb5) proteins associated with photosystem II plus a photosystem I antenna composed of proteins encoded by at least Lhca3 and two green algal-specific proteins encoded by the Lhca2 and 9 genes. In organisms containing secondary plastids, we found no evidence for orthologs to the plant/algal antennae with the exception of CP29. We also identified PsbS homologs in the Ulvophyceae and the Prasinophyceae, indicating that this distinctive protein appeared prior to green algal diversification. This analysis provides a snapshot of the antenna systems in diverse green algae, and allows us to infer the changing complexity of the antenna system during green algal evolution.
Collapse
Affiliation(s)
- Adam G Koziol
- Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada E3B 5A3
| | | | | | | | | | | |
Collapse
|
8
|
Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JES. Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics 2007; 8:59. [PMID: 17316423 PMCID: PMC1808478 DOI: 10.1186/1471-2105-8-59] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Accepted: 02/22/2007] [Indexed: 11/13/2022] Open
Abstract
Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at .
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Tobias Helmus
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Egon L Willighagen
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Stefan Kuhn
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Martin Eklund
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | | | - Peter Murray-Rust
- Department of Chemistry, Unilever Centre for Molecular Informatics, University of Cambridge, Cambridge, UK
| | - Christoph Steinbeck
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Jarl ES Wikberg
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Lang BF, Laforest MJ, Burger G. Mitochondrial introns: a critical view. Trends Genet 2007; 23:119-25. [PMID: 17280737 DOI: 10.1016/j.tig.2007.01.006] [Citation(s) in RCA: 234] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Revised: 12/14/2006] [Accepted: 01/18/2007] [Indexed: 11/17/2022]
Abstract
Although group I and group II introns were discovered more than 25 years ago, they are still difficult to identify. Modeling their RNA structure also remains particularly challenging for organelle sequences, owing to their great diversity. In fact, accelerated evolution in organelles often results in a reduced RNA structure and a loss of autocatalytic splicing and intron mobility. We set out to identify all mitochondrial group I and II introns in published sequences, and, to this end, we developed and applied a new search approach: RNAweasel. On the basis of the results, we focus here on building a comprehensive picture of mitochondrial group I introns, including a modified (reduced) consensus RNA secondary structure and a concise phylogeny-based subclassification.
Collapse
Affiliation(s)
- B Franz Lang
- Robert Cedergren Centre, Program in Evolutionary Biology, Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada.
| | | | | |
Collapse
|
10
|
O'Brien EA, Koski LB, Zhang Y, Yang L, Wang E, Gray MW, Burger G, Lang BF. TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res 2007; 35:D445-51. [PMID: 17202165 PMCID: PMC1899108 DOI: 10.1093/nar/gkl770] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Revised: 09/11/2006] [Accepted: 10/01/2006] [Indexed: 11/29/2022] Open
Abstract
The TBestDB database contains approximately 370,000 clustered expressed sequence tag (EST) sequences from 49 organisms, covering a taxonomically broad range of poorly studied, mainly unicellular eukaryotes, and includes experimental information, consensus sequences, gene annotations and metabolic pathway predictions. Most of these ESTs have been generated by the Protist EST Program, a collaboration among six Canadian research groups. EST sequences are read from trace files up to a minimum quality cut-off, vector and linker sequence is masked, and the ESTs are clustered using phrap. The resulting consensus sequences are automatically annotated by using the AutoFACT program. The datasets are automatically checked for clustering errors due to chimerism and potential cross-contamination between organisms, and suspect data are flagged in or removed from the database. Access to data deposited in TBestDB by individual users can be restricted to those users for a limited period. With this first report on TBestDB, we open the database to the research community for free processing, annotation, interspecies comparisons and GenBank submission of EST data generated in individual laboratories. For instructions on submission to TBestDB, contact tbestdb@bch.umontreal.ca. The database can be queried at http://tbestdb.bcm.umontreal.ca/.
Collapse
Affiliation(s)
- Emmet A O'Brien
- Département de Biochimie, Canadian Institute for Advanced Research, Robert-Cedergren Centre for Research in Bioinformatics and Genomics, Université de Montréal, 2900 Edouard-Montpetit, Montréal, QC, Canada H3T 1J4.
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Biegert A, Mayer C, Remmert M, Söding J, Lupas AN. The MPI Bioinformatics Toolkit for protein sequence analysis. Nucleic Acids Res 2006; 34:W335-9. [PMID: 16845021 PMCID: PMC1538786 DOI: 10.1093/nar/gkl217] [Citation(s) in RCA: 224] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at .
Collapse
Affiliation(s)
- Andreas Biegert
- Department of protein Evolution, Max-Planck-Institute for Developmental Biology, Spemannstrasse 35, 72076 Tubingen, Germany.
| | | | | | | | | |
Collapse
|
12
|
Wren JD, Johnson D, Gruenwald L. Automating genomic data mining via a sequence-based matrix format and associative rule set. BMC Bioinformatics 2005; 6 Suppl 2:S2. [PMID: 16026599 PMCID: PMC1637034 DOI: 10.1186/1471-2105-6-s2-s2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands.
Collapse
Affiliation(s)
- Jonathan D Wren
- Advanced Center for Genome Technology, Department of Botany and Microbiology, 101 David L. Boren Blvd. Rm 2025
| | - David Johnson
- School of Computer Science, The University of Oklahoma, Norman, Oklahoma 73019
| | - Le Gruenwald
- School of Computer Science, The University of Oklahoma, Norman, Oklahoma 73019
| |
Collapse
|
13
|
Gracy J, Chiche L. PAT: a protein analysis toolkit for integrated biocomputing on the web. Nucleic Acids Res 2005; 33:W65-71. [PMID: 15980554 PMCID: PMC1160216 DOI: 10.1093/nar/gki455] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 04/01/2005] [Accepted: 04/01/2005] [Indexed: 11/25/2022] Open
Abstract
PAT, for Protein Analysis Toolkit, is an integrated biocomputing server. The main goal of its design was to facilitate the combination of different processing tools for complex protein analyses and to simplify the automation of repetitive tasks. The PAT server provides a standardized web interface to a wide range of protein analysis tools. It is designed as a streamlined analysis environment that implements many features which strongly simplify studies dealing with protein sequences and structures and improve productivity. PAT is able to read and write data in many bioinformatics formats and to create any desired pipeline by seamlessly sending the output of a tool to the input of another tool. PAT can retrieve protein entries from identifier-based queries by using pre-computed database indexes. Users can easily formulate complex queries combining different analysis tools with few mouse clicks, or via a dedicated macro language, and a web session manager provides direct access to any temporary file generated during the user session. PAT is freely accessible on the Internet at http://pat.cbs.cnrs.fr.
Collapse
Affiliation(s)
- Jérôme Gracy
- Centre de Biochimie Structurale, UMR5048 and UMR554 CNRS-INSERM-Université Montpellier I, Faculté de Pharmacie, 15 avenue Charles Flahault, BP 14491, 34093 Montpellier-Cedex 5, France.
| | | |
Collapse
|