1
|
Cai P, Liu S, Zhang D, Xing H, Han M, Liu D, Gong L, Hu QN. SynBioTools: a one-stop facility for searching and selecting synthetic biology tools. BMC Bioinformatics 2023; 24:152. [PMID: 37069545 PMCID: PMC10111727 DOI: 10.1186/s12859-023-05281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/11/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND The rapid development of synthetic biology relies heavily on the use of databases and computational tools, which are also developing rapidly. While many tool registries have been created to facilitate tool retrieval, sharing, and reuse, no relatively comprehensive tool registry or catalog addresses all aspects of synthetic biology. RESULTS We constructed SynBioTools, a comprehensive collection of synthetic biology databases, computational tools, and experimental methods, as a one-stop facility for searching and selecting synthetic biology tools. SynBioTools includes databases, computational tools, and methods extracted from reviews via SCIentific Table Extraction, a scientific table-extraction tool that we built. Approximately 57% of the resources that we located and included in SynBioTools are not mentioned in bio.tools, the dominant tool registry. To improve users' understanding of the tools and to enable them to make better choices, the tools are grouped into nine modules (each with subdivisions) based on their potential biosynthetic applications. Detailed comparisons of similar tools in every classification are included. The URLs, descriptions, source references, and the number of citations of the tools are also integrated into the system. CONCLUSIONS SynBioTools is freely available at https://synbiotools.lifesynther.com/ . It provides end-users and developers with a useful resource of categorized synthetic biology databases, tools, and methods to facilitate tool retrieval and selection.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
2
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|
3
|
Duvaud S, Gabella C, Lisacek F, Stockinger H, Ioannidis V, Durinx C. Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users. Nucleic Acids Res 2021; 49:W216-W227. [PMID: 33849055 PMCID: PMC8265094 DOI: 10.1093/nar/gkab225] [Citation(s) in RCA: 269] [Impact Index Per Article: 89.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/11/2021] [Accepted: 04/01/2021] [Indexed: 12/16/2022] Open
Abstract
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss) creates, maintains and disseminates a portfolio of reliable and state-of-the-art bioinformatics services and resources for the storage, analysis and interpretation of biological data. Through Expasy (https://www.expasy.org), the Swiss Bioinformatics Resource Portal, the scientific community worldwide, freely accesses more than 160 SIB resources supporting a wide range of life science and biomedical research areas. In 2020, Expasy was redesigned through a user-centric approach, known as User-Centred Design (UCD), whose aim is to create user interfaces that are easy-to-use, efficient and targeting the intended community. This approach, widely used in other fields such as marketing, e-commerce, and design of mobile applications, is still scarcely explored in bioinformatics. In total, around 50 people were actively involved, including internal stakeholders and end-users. In addition to an optimised interface that meets users' needs and expectations, the new version of Expasy provides an up-to-date and accurate description of high-quality resources based on a standardised ontology, allowing to connect functionally-related resources.
Collapse
Affiliation(s)
- Séverine Duvaud
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Chiara Gabella
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, and Computer Science Department, University of Geneva, CH-1227 Geneva, Switzerland.,Section of Biology, University of Geneva, CH-1205 Geneva, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Vassilios Ioannidis
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Christine Durinx
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci 2021; 30:70-82. [PMID: 32881101 PMCID: PMC7737788 DOI: 10.1002/pro.3943] [Citation(s) in RCA: 3559] [Impact Index Per Article: 1186.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 12/27/2022]
Abstract
UCSF ChimeraX is the next-generation interactive visualization program from the Resource for Biocomputing, Visualization, and Informatics (RBVI), following UCSF Chimera. ChimeraX brings (a) significant performance and graphics enhancements; (b) new implementations of Chimera's most highly used tools, many with further improvements; (c) several entirely new analysis features; (d) support for new areas such as virtual reality, light-sheet microscopy, and medical imaging data; (e) major ease-of-use advances, including toolbars with icons to perform actions with a single click, basic "undo" capabilities, and more logical and consistent commands; and (f) an app store for researchers to contribute new tools. ChimeraX includes full user documentation and is free for noncommercial use, with downloads available for Windows, Linux, and macOS from https://www.rbvi.ucsf.edu/chimerax.
Collapse
Affiliation(s)
- Eric F. Pettersen
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Thomas D. Goddard
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Conrad C. Huang
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Elaine C. Meng
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Gregory S. Couch
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Tristan I. Croll
- Cambridge Institute for Medical Research, Department of HaematologyUniversity of CambridgeCambridgeUK
| | - John H. Morris
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Thomas E. Ferrin
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| |
Collapse
|
5
|
Lachmann A, Clarke DJB, Torre D, Xie Z, Ma'ayan A. Interoperable RNA-Seq analysis in the cloud. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194521. [PMID: 32156561 DOI: 10.1016/j.bbagrm.2020.194521] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 12/25/2022]
Abstract
RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Alexander Lachmann
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA.
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Denis Torre
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| |
Collapse
|
6
|
Wei Q, Zhang Y, Amith M, Lin R, Lapeyrolerie J, Tao C, Xu H. Recognizing software names in biomedical literature using machine learning. Health Informatics J 2019; 26:21-33. [PMID: 31566474 PMCID: PMC7334865 DOI: 10.1177/1460458219869490] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning-based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature.
Collapse
Affiliation(s)
| | | | - Muhammad Amith
- The University of Texas Health Science Center at Houston, USA
| | | | | | | | - Hua Xu
- The University of Texas Health Science Center at Houston, USA
| |
Collapse
|
7
|
RNApolis: Computational Platform for RNA Structure Analysis. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2019. [DOI: 10.2478/fcds-2019-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
In the 1970s, computer scientists began to engage in research in the field of structural biology. The first structural databases, as well as models and methods supporting the analysis of biomolecule structures, started to be created. RNA was put at the centre of scientific interest quite late. However, more and more methods dedicated to this molecule are currently being developed. This paper presents RNApolis - a new computing platform, which offers access to seven bioinformatic tools developed to support the RNA structure study. The set of tools include a structural database and systems for predicting, modelling, annotating and evaluating the RNA structure. RNApolis supports research at different structural levels and allows the discovery, establishment, and validation of relationships between the primary, secondary and tertiary structure of RNAs. The platform is freely available at http://rnapolis.pl
Collapse
|
8
|
Bagnacani A, Wolfien M, Wolkenhauer O. Tools for Understanding miRNA-mRNA Interactions for Reproducible RNA Analysis. Methods Mol Biol 2019; 1912:199-214. [PMID: 30635895 DOI: 10.1007/978-1-4939-8982-9_8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
MicroRNAs (miRNAs) are an integral part of gene regulation at the post-transcriptional level. The use of RNA data in gene expression analysis has become increasingly important to gain insights into the regulatory mechanisms behind miRNA-mRNA interactions. As a result, we are confronted with a growing landscape of tools, while standards for reproducibility and benchmarking lag behind. This work identifies the challenges for reproducible RNA analysis, and highlights best practices on the processing and dissemination of scientific results. We found that the success of a tool does not solely depend on its performances: equally important is how a tool is received, and then supported within a community. This leads us to a detailed presentation of the RNA workbench, a community effort for sharing workflows and processing tools, built on top of the Galaxy framework. Here, we follow the community guidelines to extend its portfolio of RNA tools with the integration of the TriplexRNA ( https://triplexrna.org ). Our findings provide the basis for the development of a recommendation system, to guide users in the choice of tools and workflows.
Collapse
Affiliation(s)
- Andrea Bagnacani
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany.
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
- Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
9
|
Santos HDAD, Oliveira MIS, Lima GDFAB, da Silva KM, S. Muniz RIVC, Lóscio BF. Investigations into data published and consumed on the Web: a systematic mapping study. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2018. [DOI: 10.1186/s13173-018-0077-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
10
|
Doppelt-Azeroual O, Mareuil F, Deveaud E, Kalaš M, Soranzo N, van den Beek M, Grüning B, Ison J, Ménager H. ReGaTE: Registration of Galaxy Tools in Elixir. Gigascience 2018; 6:1-4. [PMID: 28402416 PMCID: PMC5530318 DOI: 10.1093/gigascience/gix022] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 03/21/2017] [Indexed: 11/14/2022] Open
Abstract
Background Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Fabien Mareuil
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Eric Deveaud
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormøhlensgate 55, Bergen, Norway
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, NR4 7UG Norwich, United Kingdom
| | - Marius van den Beek
- Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Paris, France
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Jon Ison
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Kongens, Lyngby, Denmark
| | - Hervé Ménager
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| |
Collapse
|
11
|
U-Index, a dataset and an impact metric for informatics tools and databases. Sci Data 2018; 5:180043. [PMID: 29557976 PMCID: PMC5859919 DOI: 10.1038/sdata.2018.43] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 02/08/2018] [Indexed: 01/28/2023] Open
Abstract
Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources’ impact over time. Our dataset differentiates the context in which citations occur to distinguish between ‘awareness’ and ‘usage’, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.
Collapse
|
12
|
|
13
|
Goddard TD, Huang CC, Meng EC, Pettersen EF, Couch GS, Morris JH, Ferrin TE. UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci 2018; 27:14-25. [PMID: 28710774 PMCID: PMC5734306 DOI: 10.1002/pro.3235] [Citation(s) in RCA: 2618] [Impact Index Per Article: 436.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 07/07/2017] [Accepted: 07/10/2017] [Indexed: 12/18/2022]
Abstract
UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution. This article highlights some specific advances in the areas of visualization and usability, performance, and extensibility. ChimeraX is free for noncommercial use and is available from http://www.rbvi.ucsf.edu/chimerax/ for Windows, Mac, and Linux.
Collapse
Affiliation(s)
- Thomas D. Goddard
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - Conrad C. Huang
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - Elaine C. Meng
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - Eric F. Pettersen
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - Gregory S. Couch
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - John H. Morris
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| | - Thomas E. Ferrin
- Department of Pharmaceutical ChemistryUniversity of California San FranciscoSan FranciscoCalifornia94143
| |
Collapse
|
14
|
Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H. Using bio.tools to generate and annotate workbench tool descriptions. F1000Res 2017; 6:ELIXIR-2074. [PMID: 29333231 PMCID: PMC5747335 DOI: 10.12688/f1000research.12974.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/26/2017] [Indexed: 11/20/2022] Open
Abstract
Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.
Collapse
Affiliation(s)
- Kenzo-Hugo Hillion
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Paris, France
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anton Khodak
- Igor Sikorsky Kyiv Polytechnic Institute, National Technical University of Ukraine, Kyiv, Ukraine
| | - Eric Rasche
- Lehrstuhl für Bioinformatik, Institut für Informatik, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
| | | | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jon Ison
- DTU Bioinformatics, Technical University of Denmark, Copenhagen, Denmark
| | - Hervé Ménager
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Paris, France
| |
Collapse
|
15
|
Urdidiales‐Nieto D, Navas‐Delgado I, Aldana‐Montes JF. Biological Web Service Repositories Review. Mol Inform 2017; 36:1600035. [PMID: 27783459 PMCID: PMC5434852 DOI: 10.1002/minf.201600035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 09/27/2016] [Indexed: 12/26/2022]
Abstract
Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re-execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re-check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well-known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories.
Collapse
Affiliation(s)
- David Urdidiales‐Nieto
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| | - Ismael Navas‐Delgado
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| | - José F. Aldana‐Montes
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| |
Collapse
|
16
|
Guardia GD, Ferreira Pires L, da Silva EG, de Farias CR. SemanticSCo: A platform to support the semantic composition of services for gene expression analysis. J Biomed Inform 2017; 66:116-128. [DOI: 10.1016/j.jbi.2016.12.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 11/27/2016] [Accepted: 12/31/2016] [Indexed: 10/20/2022]
|
17
|
From the evaluation of existing solutions to an all-inclusive package for biobanks. HEALTH AND TECHNOLOGY 2017; 7:89-95. [PMID: 28344915 PMCID: PMC5346419 DOI: 10.1007/s12553-016-0175-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 12/19/2016] [Indexed: 11/26/2022]
Abstract
The domain of biobanking has gone through many stages and as a result there are a wide range of commercial and open source software solutions available. The utilization of these software tools requires different levels of domain and technical skills for installation, configuration and ultimate us of these biobank software tools. To compound this complexity the biobanking community are required to work together in order to share knowledge and jointly build solutions to underpin the research infrastructure. We have evaluated the available tools, described them in a catalogue (BiobankApps) and made a selection of tools available to biobanks in a reference toolbox (BIBBOX) that are use-case driven. In the BiobankApps tool catalogue, both commercial and open source software solutions related to the biobanking domain are included, classified and evaluated. The evaluation covers: 1) “user review” by an authenticated user 2) domain expert: quick analysis by BBMRI members and 3) domain expert: detailed analysis and test installation with real world data. The evaluation is paired with a survey across the more “advanced” (from a technology perspective) biobanks to investigate what tools are currently used and summarises known benefits/drawbacks of the respective packages. In the second step we recommend tools for specific use cases, and install, configure and connect these in the BIBBOX framework. This service also builds on the existing work in the United Kingdom in seeking to establish the motivations for different stakeholders to become involved and therefore assisting in prioritising the use-cases based on the level of need and support within the research community. All tools associated to a use-case are available as BIBBOX applications (technically this is achieved by docker containers), which are integrated in the BIBBOX framework with central identification and user management. In future work we plan to share the acquired knowledge with other networks, develop an Application Programmable Interface (API) for the exchange of metadata with other tool catalogues and work on an ontology for the evaluation of biobank software.
Collapse
|
18
|
Zaveri A, Dastgheib S, Wu C, Whetzel T, Verborgh R, Avillach P, Korodi G, Terryn R, Jagodnik K, Assis P, Dumontier M. smartAPI: Towards a More Intelligent Network of Web APIs. THE SEMANTIC WEB 2017. [DOI: 10.1007/978-3-319-58451-5_11] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
19
|
Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows. Methods Mol Biol 2017; 1647:221-236. [PMID: 28809006 DOI: 10.1007/978-1-4939-7201-2_15] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.
Collapse
|
20
|
|
21
|
Przybyła P, Shardlow M, Aubin S, Bossy R, Eckart de Castilho R, Piperidis S, McNaught J, Ananiadou S. Text mining resources for the life sciences. Database (Oxford) 2016; 2016:baw145. [PMID: 27888231 PMCID: PMC5199186 DOI: 10.1093/database/baw145] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/13/2016] [Accepted: 10/17/2016] [Indexed: 11/18/2022]
Abstract
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability.
Collapse
Affiliation(s)
- Piotr Przybyła
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, UK
| | - Matthew Shardlow
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, UK
| | - Sophie Aubin
- Institut National de la Recherche Agronomique, Jouy-en-Josas, France
| | - Robert Bossy
- Institut National de la Recherche Agronomique, Jouy-en-Josas, France
| | | | - Stelios Piperidis
- Institute for Language and Speech Processing, Athena Research Center, Athens, Greece
| | - John McNaught
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, UK
| |
Collapse
|
22
|
Hardisty AR, Bacall F, Beard N, Balcázar-Vargas MP, Balech B, Barcza Z, Bourlat SJ, De Giovanni R, de Jong Y, De Leo F, Dobor L, Donvito G, Fellows D, Guerra AF, Ferreira N, Fetyukova Y, Fosso B, Giddy J, Goble C, Güntsch A, Haines R, Ernst VH, Hettling H, Hidy D, Horváth F, Ittzés D, Ittzés P, Jones A, Kottmann R, Kulawik R, Leidenberger S, Lyytikäinen-Saarenmaa P, Mathew C, Morrison N, Nenadic A, de la Hidalga AN, Obst M, Oostermeijer G, Paymal E, Pesole G, Pinto S, Poigné A, Fernandez FQ, Santamaria M, Saarenmaa H, Sipos G, Sylla KH, Tähtinen M, Vicario S, Vos RA, Williams AR, Yilmaz P. BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology. BMC Ecol 2016; 16:49. [PMID: 27765035 PMCID: PMC5073428 DOI: 10.1186/s12898-016-0103-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 10/13/2016] [Indexed: 02/08/2023] Open
Abstract
Background Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as “Web services”) and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust “in silico” science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. Results BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible ‘virtual laboratory’, free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Conclusions Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research. Electronic supplementary material The online version of this article (doi:10.1186/s12898-016-0103-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alex R Hardisty
- School of Computer Science and Informatics, Cardiff University, Queens Buildings, 5 The Parade, Cardiff, CF24 3AA, UK.
| | - Finn Bacall
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Niall Beard
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Maria-Paula Balcázar-Vargas
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, PO Box 94248, 1090, Amsterdam, The Netherlands
| | - Bachir Balech
- Institute of Biomembranes and Bioenergetics (IBBE), National Research Council (CNR), via Amendola 165/A, 70126, Bari, Italy
| | - Zoltán Barcza
- Department of Meteorology, Eötvös Loránd University, Pázmány sétány 1/A, Budapest, 1117, Hungary
| | - Sarah J Bourlat
- Department of Marine Sciences, University of Gothenburg, Box 463, 405 30, Gothenburg, Sweden
| | - Renato De Giovanni
- Centro de Referência em Informação Ambiental, Avenida Dr. Romeu Tórtima, 388, Campinas, SP, 13084-791, Brazil
| | - Yde de Jong
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, PO Box 94248, 1090, Amsterdam, The Netherlands.,SIB Labs, Joensuu Science Park, University of Eastern Finland, P.O. Box 111, 80101, Joensuu, Finland
| | - Francesca De Leo
- Institute of Biomembranes and Bioenergetics (IBBE), National Research Council (CNR), via Amendola 165/A, 70126, Bari, Italy
| | - Laura Dobor
- Department of Meteorology, Eötvös Loránd University, Pázmány sétány 1/A, Budapest, 1117, Hungary
| | - Giacinto Donvito
- Institute of Nuclear Physics (INFN), Via E. Orabona 4, 70125, Bari, Italy
| | - Donal Fellows
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Antonio Fernandez Guerra
- Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, 28359, Bremen, Germany.,Jacobs University Bremen GmbH, Campus Ring 1, 28359, Bremen, Germany
| | - Nuno Ferreira
- Stichting EGI (EGI.eu), Science Park 140, 1098, Amsterdam, The Netherlands
| | - Yuliya Fetyukova
- SIB Labs, Joensuu Science Park, University of Eastern Finland, P.O. Box 111, 80101, Joensuu, Finland
| | - Bruno Fosso
- Institute of Biomembranes and Bioenergetics (IBBE), National Research Council (CNR), via Amendola 165/A, 70126, Bari, Italy
| | - Jonathan Giddy
- School of Computer Science and Informatics, Cardiff University, Queens Buildings, 5 The Parade, Cardiff, CF24 3AA, UK
| | - Carole Goble
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Anton Güntsch
- Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Königin-Luise-Strasse 6-8, 14195, Berlin, Germany
| | - Robert Haines
- IT Services, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Vera Hernández Ernst
- Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
| | - Hannes Hettling
- Naturalis Biodiversity Center, Postbus 9517, 2300, Leiden, The Netherlands
| | - Dóra Hidy
- MTA-SZIE Plant Ecology Research Group, Szent István University, Páter K. u.1., Gödöllő, 2103, Hungary
| | - Ferenc Horváth
- Institute of Ecology and Botany, Centre for Ecological Research, Hungarian Academy of Sciences, Alkotmány u. 2-4., Vácrátót, 2163, Hungary
| | - Dóra Ittzés
- Institute of Ecology and Botany, Centre for Ecological Research, Hungarian Academy of Sciences, Alkotmány u. 2-4., Vácrátót, 2163, Hungary
| | - Péter Ittzés
- Institute of Ecology and Botany, Centre for Ecological Research, Hungarian Academy of Sciences, Alkotmány u. 2-4., Vácrátót, 2163, Hungary
| | - Andrew Jones
- School of Computer Science and Informatics, Cardiff University, Queens Buildings, 5 The Parade, Cardiff, CF24 3AA, UK
| | - Renzo Kottmann
- Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, 28359, Bremen, Germany
| | - Robert Kulawik
- Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
| | - Sonja Leidenberger
- Swedish Species Information Centre/ArtDatabanken, Swedish University of Agricultural Sciences, Bäcklösavägen 10, 750 07, Uppsala, Sweden
| | | | - Cherian Mathew
- Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Königin-Luise-Strasse 6-8, 14195, Berlin, Germany
| | - Norman Morrison
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Aleksandra Nenadic
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Abraham Nieva de la Hidalga
- School of Computer Science and Informatics, Cardiff University, Queens Buildings, 5 The Parade, Cardiff, CF24 3AA, UK
| | - Matthias Obst
- Department of Marine Sciences, University of Gothenburg, Box 463, 405 30, Gothenburg, Sweden
| | - Gerard Oostermeijer
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, PO Box 94248, 1090, Amsterdam, The Netherlands
| | - Elisabeth Paymal
- Fondation pour la Recherche sur la Biodiversité (FRB), 195, rue Saint-Jacques, 75005, Paris, France
| | - Graziano Pesole
- Institute of Biomembranes and Bioenergetics (IBBE), National Research Council (CNR), via Amendola 165/A, 70126, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "A. Moro", via Orabona, 1514, 70126, Bari, Italy
| | - Salvatore Pinto
- Stichting EGI (EGI.eu), Science Park 140, 1098, Amsterdam, The Netherlands
| | - Axel Poigné
- Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
| | - Francisco Quevedo Fernandez
- School of Computer Science and Informatics, Cardiff University, Queens Buildings, 5 The Parade, Cardiff, CF24 3AA, UK
| | - Monica Santamaria
- Institute of Biomembranes and Bioenergetics (IBBE), National Research Council (CNR), via Amendola 165/A, 70126, Bari, Italy
| | - Hannu Saarenmaa
- SIB Labs, Joensuu Science Park, University of Eastern Finland, P.O. Box 111, 80101, Joensuu, Finland
| | - Gergely Sipos
- Stichting EGI (EGI.eu), Science Park 140, 1098, Amsterdam, The Netherlands
| | - Karl-Heinz Sylla
- Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
| | - Marko Tähtinen
- Finnish Museum of Natural History, University of Helsinki, P.O. Box 17, 00014, Helsinki, Finland
| | - Saverio Vicario
- Institute of Biomedical Technology (ITB), National Research Council (CNR), via Amendola 122/D, 70126, Bari, Italy
| | - Rutger Aldo Vos
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, PO Box 94248, 1090, Amsterdam, The Netherlands.,Naturalis Biodiversity Center, Postbus 9517, 2300, Leiden, The Netherlands
| | - Alan R Williams
- School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, UK
| | - Pelin Yilmaz
- Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, 28359, Bremen, Germany
| |
Collapse
|
23
|
How Aphia—The Platform behind Several Online and Taxonomically Oriented Databases—Can Serve Both the Taxonomic Community and the Field of Biodiversity Informatics. JOURNAL OF MARINE SCIENCE AND ENGINEERING 2015. [DOI: 10.3390/jmse3041448] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
24
|
Kanterakis A, Kuiper J, Potamias G, Swertz MA. PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols. SOURCE CODE FOR BIOLOGY AND MEDICINE 2015; 10:14. [PMID: 26587054 PMCID: PMC4652372 DOI: 10.1186/s13029-015-0042-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 10/20/2015] [Indexed: 11/10/2022]
Abstract
Background Today researchers can choose from many bioinformatics protocols for all types of life sciences research, computational environments and coding languages. Although the majority of these are open source, few of them possess all virtues to maximize reuse and promote reproducible science. Wikipedia has proven a great tool to disseminate information and enhance collaboration between users with varying expertise and background to author qualitative content via crowdsourcing. However, it remains an open question whether the wiki paradigm can be applied to bioinformatics protocols. Results We piloted PyPedia, a wiki where each article is both implementation and documentation of a bioinformatics computational protocol in the python language. Hyperlinks within the wiki can be used to compose complex workflows and induce reuse. A RESTful API enables code execution outside the wiki. Initial content of PyPedia contains articles for population statistics, bioinformatics format conversions and genotype imputation. Use of the easy to learn wiki syntax effectively lowers the barriers to bring expert programmers and less computer savvy researchers on the same page. Conclusions PyPedia demonstrates how wiki can provide a collaborative development, sharing and even execution environment for biologists and bioinformaticians that complement existing resources, useful for local and multi-center research teams. Availability PyPedia is available online at: http://www.pypedia.com. The source code and installation instructions are available at: https://github.com/kantale/PyPedia_server. The PyPedia python library is available at: https://github.com/kantale/pypedia. PyPedia is open-source, available under the BSD 2-Clause License. Electronic supplementary material The online version of this article (doi:10.1186/s13029-015-0042-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexandros Kanterakis
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Postbus 30 001, Groningen, 9700 RB The Netherlands ; Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), Nikolaou Plastira 100, Heraklion, 71110 Greece
| | - Joël Kuiper
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Postbus 30 001, Groningen, 9700 RB The Netherlands
| | - George Potamias
- Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), Nikolaou Plastira 100, Heraklion, 71110 Greece
| | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Postbus 30 001, Groningen, 9700 RB The Netherlands
| |
Collapse
|
25
|
Ison J, Rapacki K, Ménager H, Kalaš M, Rydza E, Chmura P, Anthon C, Beard N, Berka K, Bolser D, Booth T, Bretaudeau A, Brezovsky J, Casadio R, Cesareni G, Coppens F, Cornell M, Cuccuru G, Davidsen K, Vedova GD, Dogan T, Doppelt-Azeroual O, Emery L, Gasteiger E, Gatter T, Goldberg T, Grosjean M, Grüning B, Helmer-Citterich M, Ienasescu H, Ioannidis V, Jespersen MC, Jimenez R, Juty N, Juvan P, Koch M, Laibe C, Li JW, Licata L, Mareuil F, Mičetić I, Friborg RM, Moretti S, Morris C, Möller S, Nenadic A, Peterson H, Profiti G, Rice P, Romano P, Roncaglia P, Saidi R, Schafferhans A, Schwämmle V, Smith C, Sperotto MM, Stockinger H, Vařeková RS, Tosatto SCE, de la Torre V, Uva P, Via A, Yachdav G, Zambelli F, Vriend G, Rost B, Parkinson H, Løngreen P, Brunak S. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 2015; 44:D38-47. [PMID: 26538599 PMCID: PMC4702812 DOI: 10.1093/nar/gkv1116] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 10/13/2015] [Indexed: 01/24/2023] Open
Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Kristoffer Rapacki
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Hervé Ménager
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
| | - Emil Rydza
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Piotr Chmura
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Christian Anthon
- Department of Veterinary Clinical and Animal Sciences, Faculty for Health and Medical Sciences, University of Copenhagen, Denmark
| | - Niall Beard
- School of Computer Science, University of Manchester, UK
| | - Karel Berka
- Department of Physical Chemistry, RCPTM, Faculty of Science, Palacky University, Czech Republic
| | - Dan Bolser
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Tim Booth
- NEBC Wallingford, Centre for Ecology and Hydrology, UK
| | - Anthony Bretaudeau
- INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), France INRIA, IRISA, GenOuest Core Facility, France
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Czech Republic
| | - Rita Casadio
- Bologna Biocomputing Group, University of Bologna, Italy
| | | | - Frederik Coppens
- Department of Plant Systems Biology, VIB, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium
| | | | | | - Kristian Davidsen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Tunca Dogan
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Laura Emery
- The European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Thomas Gatter
- Faculty of Technology and Center for Biotechnology, Universität Bielefeld, Germany
| | | | - Marie Grosjean
- Institut Français de Bioinformatique (French Institute of Bioinformatics), CNRS, UMS3601, France
| | - Björn Grüning
- Albert-Ludwigs-Universität Freiburg, Fahnenbergplatz, 79085 Freiburg
| | | | - Hans Ienasescu
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Denmark
| | | | - Martin Closter Jespersen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Nick Juty
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Peter Juvan
- Centre for Functional Genomics and Biochips, Faculty of Medicine, University of Ljubljana, Slovenia
| | | | - Camille Laibe
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Jing-Woei Li
- Faculty of Medicine, The Chinese University of Hong Kong, China Hong Kong Bioinformatics Centre, School of Life Sciences,The Chinese University of Hong Kong, China
| | - Luana Licata
- Dept. of Biology, University of Rome Tor Vergata, Italy
| | - Fabien Mareuil
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Ivan Mičetić
- Department of Biomedical Sciences, University of Padua, Italy
| | | | - Sebastien Moretti
- SIB Swiss Institute of Bioinformatics, Switzerland Department of Ecology and Evolution, Biophore, Evolutionary Bioinformatics group, University of Lausanne, Switzerland
| | | | - Steffen Möller
- Department of Dermatology, University of Lübeck, Germany Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Germany
| | | | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Estonia
| | | | - Peter Rice
- Department of Computing, William Penney Laboratory, Imperial College London, UK
| | | | | | - Rabie Saidi
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Veit Schwämmle
- Protein Research Group, Department for Biochemistry and Molecular Biology, University of Southern Denmark, Denmark
| | | | - Maria Maddalena Sperotto
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | | | | | - Victor de la Torre
- National Bioinformatics Institute Unit (INB), Fundacion Centro Nacional de Investigaciones Oncologicas, Spain
| | | | - Allegra Via
- Dept. of Physics, Sapienza University, Italy
| | - Guy Yachdav
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | - Federico Zambelli
- Institute of Biomembranes and Bioenergetics, National Research Council (CNR), and Dept. of Biosciences, University of Milano, Italy
| | - Gert Vriend
- Radboud University Medical Centre, CMBI, Netherlands
| | - Burkhard Rost
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | | | - Peter Løngreen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| |
Collapse
|
26
|
Sfakianaki P, Koumakis L, Sfakianakis S, Iatraki G, Zacharioudakis G, Graf N, Marias K, Tsiknakis M. Semantic biomedical resource discovery: a Natural Language Processing framework. BMC Med Inform Decis Mak 2015; 15:77. [PMID: 26423616 PMCID: PMC4591066 DOI: 10.1186/s12911-015-0200-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 09/21/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. METHODS A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). RESULTS For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. CONCLUSIONS There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.
Collapse
Affiliation(s)
- Pepi Sfakianaki
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Lefteris Koumakis
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Stelios Sfakianakis
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Galatia Iatraki
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Giorgos Zacharioudakis
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Norbert Graf
- Paediatric Haematology and Oncology, Saarland University Hospital, Homburg, Germany
| | - Kostas Marias
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
| | - Manolis Tsiknakis
- Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete Greece
- Department of Informatics Engineering, Technological Educational Institute, Heraklion, Crete Greece
| |
Collapse
|
27
|
Dahlö M, Haziza F, Kallio A, Korpelainen E, Bongcam-Rudloff E, Spjuth O. BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences. Bioinform Biol Insights 2015; 9:125-8. [PMID: 26401099 PMCID: PMC4567039 DOI: 10.4137/bbi.s28636] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Revised: 06/29/2015] [Accepted: 07/05/2015] [Indexed: 12/14/2022] Open
Abstract
Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.
Collapse
Affiliation(s)
- Martin Dahlö
- SNIC-UPPMAX, Department of Information Technology, Uppsala University, Uppsala, Sweden. ; Science for Life Laboratory, Uppsala University, Uppsala, Sweden. ; Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Frédéric Haziza
- SNIC-UPPMAX, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | | | | | - Erik Bongcam-Rudloff
- SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ola Spjuth
- SNIC-UPPMAX, Department of Information Technology, Uppsala University, Uppsala, Sweden. ; Science for Life Laboratory, Uppsala University, Uppsala, Sweden. ; Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
28
|
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing. PLoS One 2015; 10:e0134273. [PMID: 26280450 PMCID: PMC4539224 DOI: 10.1371/journal.pone.0134273] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Accepted: 07/07/2015] [Indexed: 12/04/2022] Open
Abstract
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
Collapse
|
29
|
Costa GCB, Braga R, David JMN, Campos F. A Scientific Software Product Line for the Bioinformatics domain. J Biomed Inform 2015; 56:239-64. [DOI: 10.1016/j.jbi.2015.05.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 04/04/2015] [Accepted: 05/19/2015] [Indexed: 11/17/2022]
|
30
|
Dobor L, Barcza Z, Hlásny T, Havasi Á, Horváth F, Ittzés P, Bartholy J. Bridging the gap between climate models and impact studies: the FORESEE Database. GEOSCIENCE DATA JOURNAL 2015; 2:1-11. [PMID: 28616227 PMCID: PMC5445562 DOI: 10.1002/gdj3.22] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Revised: 11/13/2014] [Accepted: 11/28/2014] [Indexed: 06/07/2023]
Abstract
Studies on climate change impacts are essential for identifying vulnerabilities and developing adaptation options. However, such studies depend crucially on the availability of reliable climate data. In this study, we introduce the climatological database called FORESEE (Open Database for Climate Change Related Impact Studies in Central Europe), which was developed to support the research of and adaptation to climate change in Central and Eastern Europe: the region where knowledge of possible climate change effects is inadequate. A questionnaire-based survey was used to specify database structure and content. FORESEE contains the seamless combination of gridded daily observation-based data (1951-2013) built on the E-OBS and CRU TS datasets, and a collection of climate projections (2014-2100). The future climate is represented by bias-corrected meteorological data from 10 regional climate models (RCMs), driven by the A1B emission scenario. These latter data were developed within the frame of the ENSEMBLES FP6 project. Although FORESEE only covers a limited area of Central and Eastern Europe, the methodology of database development, the applied bias correction techniques, and the data dissemination method, can serve as a blueprint for similar initiatives.
Collapse
Affiliation(s)
- L. Dobor
- Department of MeteorologyEötvös Loránd UniversityBudapestHungary
| | - Z. Barcza
- Department of MeteorologyEötvös Loránd UniversityBudapestHungary
- Institute of Ecology and BotanyCentre for Ecological ResearchHungarian Academy of SciencesVácrátótHungary
| | - T. Hlásny
- National Forest Centre – Forest Research InstituteZvolenSlovakia
- Faculty of Forestry and Wood SciencesCzech University of Life SciencesPragueCzech Republic
| | - Á. Havasi
- Department of Applied Analysis and Computational MathematicsEötvös Loránd UniversityBudapestHungary
| | - F. Horváth
- Institute of Ecology and BotanyCentre for Ecological ResearchHungarian Academy of SciencesVácrátótHungary
| | - P. Ittzés
- Institute of Ecology and BotanyCentre for Ecological ResearchHungarian Academy of SciencesVácrátótHungary
| | - J. Bartholy
- Department of MeteorologyEötvös Loránd UniversityBudapestHungary
| |
Collapse
|
31
|
Velloso H, Vialle RA, Ortega JM. BOWS (bioinformatics open web services) to centralize bioinformatics tools in web services. BMC Res Notes 2015; 8:206. [PMID: 26032494 PMCID: PMC4467627 DOI: 10.1186/s13104-015-1190-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Accepted: 05/20/2015] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Bioinformaticians face a range of difficulties to get locally-installed tools running and producing results; they would greatly benefit from a system that could centralize most of the tools, using an easy interface for input and output. Web services, due to their universal nature and widely known interface, constitute a very good option to achieve this goal. RESULTS Bioinformatics open web services (BOWS) is a system based on generic web services produced to allow programmatic access to applications running on high-performance computing (HPC) clusters. BOWS intermediates the access to registered tools by providing front-end and back-end web services. Programmers can install applications in HPC clusters in any programming language and use the back-end service to check for new jobs and their parameters, and then to send the results to BOWS. Programs running in simple computers consume the BOWS front-end service to submit new processes and read results. BOWS compiles Java clients, which encapsulate the front-end web service requisitions, and automatically creates a web page that disposes the registered applications and clients. CONCLUSIONS Bioinformatics open web services registered applications can be accessed from virtually any programming language through web services, or using standard java clients. The back-end can run in HPC clusters, allowing bioinformaticians to remotely run high-processing demand applications directly from their machines.
Collapse
Affiliation(s)
- Henrique Velloso
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG, Brazil.
| | - Ricardo A Vialle
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG, Brazil.
| | - J Miguel Ortega
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG, Brazil.
| |
Collapse
|
32
|
Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. BIOMED RESEARCH INTERNATIONAL 2015; 2015:904541. [PMID: 26125026 PMCID: PMC4466500 DOI: 10.1155/2015/904541] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/01/2015] [Accepted: 04/01/2015] [Indexed: 02/07/2023]
Abstract
Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.
Collapse
|
33
|
Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discov Today 2015; 20:399-405. [DOI: 10.1016/j.drudis.2014.11.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 10/22/2014] [Accepted: 11/13/2014] [Indexed: 12/26/2022]
|
34
|
Duck G, Nenadic G, Brass A, Robertson DL, Stevens R. Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics 2015; 30:i601-8. [PMID: 25161253 PMCID: PMC4147923 DOI: 10.1093/bioinformatics/btu471] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. RESULTS We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. AVAILABILITY AND IMPLEMENTATION The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/.
Collapse
Affiliation(s)
- Geraint Duck
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Goran Nenadic
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Andy Brass
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - David L Robertson
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Robert Stevens
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
35
|
Abstract
With the availability of numerous curated databases, researchers are now able to efficiently use the multitude of biological data by integrating these resources via hyperlinks and cross-references. A large proportion of bioinformatics research tasks, however, may include labor-intensive tasks such as fetching, parsing, and merging datasets and functional annotations from distributed multi-domain databases. This data integration issue is one of the key challenges in bioinformatics. We aim to provide an identifier conversion and data aggregation system as a part of solution to solve this problem with a service named G-Links, 1) by gathering resource URI information from 130 databases and 30 web services in a gene-centric manner so that users can retrieve all available links about a given gene, 2) by providing RESTful API for easy retrieval of links including facet searching based on keywords and/or predicate types, and 3) by producing a variety of outputs as visual HTML page, tab-delimited text, and in Semantic Web formats such as Notation3 and RDF. G-Links as well as other relevant documentation are available at http://link.g-language.org/.
Collapse
Affiliation(s)
- Kazuki Oshita
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| |
Collapse
|
36
|
Repchevsky D, Gelpi JL. BioSWR--semantic web services registry for bioinformatics. PLoS One 2014; 9:e107889. [PMID: 25233118 PMCID: PMC4169436 DOI: 10.1371/journal.pone.0107889] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 08/21/2014] [Indexed: 11/28/2022] Open
Abstract
Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license.
Collapse
Affiliation(s)
- Dmitry Repchevsky
- Barcelona Supercomputing Center, Life-Sciences Department, National Institute of Bioinformatics, Computational Bioinformatics Node, Barcelona, Spain
| | - Josep Ll. Gelpi
- Barcelona Supercomputing Center, Life-Sciences Department, National Institute of Bioinformatics, Computational Bioinformatics Node, Barcelona, Spain
- Department of Biochemistry and Molecular Biology, University of Barcelona, Barcelona, Spain
- * E-mail:
| |
Collapse
|
37
|
Tsiliki G, Kossida S, Friesen N, Rüping S, Tzagarakis M, Karacapilidis N. A Data Mining Based Approach for Collaborative Analysis of Biomedical Data. INT J ARTIF INTELL T 2014. [DOI: 10.1142/s0218213014600100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Biomedical research becomes increasingly multidisciplinary and collaborative in nature. At the same time, it has recently seen a vast growth in publicly and instantly available information. As the available resources become more specialized, there is a growing need for multidisciplinary collaborations between biomedical researchers to address complex research questions. We present an application of a data mining algorithm to genomic data in a collaborative decision-making support environment, as a typical example of how multidisciplinary researchers can collaborate in analyzing and interpreting biomedical data. Through the proposed approach, researchers can easily decide about which data repositories should be considered, analyze the algorithmic results, discuss the weaknesses of the patterns identified, and set up new iterations of the data mining algorithm by defining other descriptive attributes or integrating other relevant data. Evaluation results show that the proposed approach facilitates users to set their research objectives and better understand the data and methodologies used in their research.
Collapse
Affiliation(s)
- Georgia Tsiliki
- Bioinformatics and Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou 115 27, Greece
| | - Sophia Kossida
- Bioinformatics and Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou 115 27, Greece
| | - Natalja Friesen
- Knowledge Discovery Group, Fraunhofer Institute IAIS, Sankt Augustin, Germany
| | - Stefan Rüping
- Knowledge Discovery Group, Fraunhofer Institute IAIS, Sankt Augustin, Germany
| | - Manolis Tzagarakis
- University of Patras and Computer Technology Institute & Press “Diophantus”, Rio Patras, Greece
| | - Nikos Karacapilidis
- University of Patras and Computer Technology Institute & Press “Diophantus”, Rio Patras, Greece
| |
Collapse
|
38
|
Rak R, Batista-Navarro RT, Carter J, Rowley A, Ananiadou S. Processing biological literature with customizable Web services supporting interoperable formats. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau064. [PMID: 25006225 PMCID: PMC4086403 DOI: 10.1093/database/bau064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL:http://argo.nactem.ac.uk.
Collapse
Affiliation(s)
- Rafal Rak
- National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101
| | - Riza Theresa Batista-Navarro
- National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101
| | - Jacob Carter
- National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101
| | - Andrew Rowley
- National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101
| |
Collapse
|
39
|
Malone J, Brown A, Lister AL, Ison J, Hull D, Parkinson H, Stevens R. The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. J Biomed Semantics 2014; 5:25. [PMID: 25068035 PMCID: PMC4098953 DOI: 10.1186/2041-1480-5-25] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 04/19/2014] [Indexed: 01/07/2023] Open
Abstract
Motivation Biomedical ontologists to date have concentrated on ontological descriptions of biomedical entities such as gene products and their attributes, phenotypes and so on. Recently, effort has diversified to descriptions of the laboratory investigations by which these entities were produced. However, much biological insight is gained from the analysis of the data produced from these investigations, and there is a lack of adequate descriptions of the wide range of software that are central to bioinformatics. We need to describe how data are analyzed for discovery, audit trails, provenance and reproducibility. Results The Software Ontology (SWO) is a description of software used to store, manage and analyze data. Input to the SWO has come from beyond the life sciences, but its main focus is the life sciences. We used agile techniques to gather input for the SWO and keep engagement with our users. The result is an ontology that meets the needs of a broad range of users by describing software, its information processing tasks, data inputs and outputs, data formats versions and so on. Recently, the SWO has incorporated EDAM, a vocabulary for describing data and related concepts in bioinformatics. The SWO is currently being used to describe software used in multiple biomedical applications. Conclusion The SWO is another element of the biomedical ontology landscape that is necessary for the description of biomedical entities and how they were discovered. An ontology of software used to analyze data produced by investigations in the life sciences can be made in such a way that it covers the important features requested and prioritized by its users. The SWO thus fits into the landscape of biomedical ontologies and is produced using techniques designed to keep it in line with user’s needs. Availability The Software Ontology is available under an Apache 2.0 license at http://theswo.sourceforge.net/; the Software Ontology blog can be read at http://softwareontology.wordpress.com.
Collapse
Affiliation(s)
- James Malone
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Andy Brown
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Allyson L Lister
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Jon Ison
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Duncan Hull
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Helen Parkinson
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
40
|
Masseroli M, Mons B, Bongcam-Rudloff E, Ceri S, Kel A, Rechenmann F, Lisacek F, Romano P. Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information. BMC Bioinformatics 2014; 15 Suppl 1:S2. [PMID: 24564249 PMCID: PMC4015876 DOI: 10.1186/1471-2105-15-s1-s2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
Collapse
Affiliation(s)
- Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Barend Mons
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
- Netherlands Bioinformatics Center, Nijmegen, 6500 HB, The Netherlands
| | - Erik Bongcam-Rudloff
- Department of Animal Breeding and Genetics, SLU-Global Bioinformatics Centre, Swedish University of Agricultural Sciences, Uppsala, 75124, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, 75108, Sweden
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, 20133, Italy
| | - Alexander Kel
- GeneXplain GmbH, Wolfenbüttel, 38302, Germany
- Institute of Chemical Biology and Fundamental Medicine SBRAS, Novosibirsk, 630090, Russia
| | | | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva 4, Switzerland
- Section of Biology, University of Geneva, 1211 Geneva 4, Switzerland
| | - Paolo Romano
- Biopolymers and Proteomics, IRCCS AOU San Martino IST, Genoa, 16132, Italy
| |
Collapse
|
41
|
Masseroli M, Picozzi M, Ghisalberti G, Ceri S. Explorative search of distributed bio-data to answer complex biomedical questions. BMC Bioinformatics 2014; 15 Suppl 1:S3. [PMID: 24564278 PMCID: PMC4015759 DOI: 10.1186/1471-2105-15-s1-s3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions. RESULTS A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions. CONCLUSIONS By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery.
Collapse
|
42
|
Kamdar MR, Zeginis D, Hasnain A, Decker S, Deus HF. ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research. J Biomed Inform 2013; 47:112-30. [PMID: 24135450 DOI: 10.1016/j.jbi.2013.10.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 09/22/2013] [Accepted: 10/01/2013] [Indexed: 10/26/2022]
Abstract
Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users.
Collapse
Affiliation(s)
- Maulik R Kamdar
- Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Ireland.
| | - Dimitris Zeginis
- Centre for Research and Technology Hellas, Thessaloniki, Greece; Information Systems Lab, University of Macedonia, Thessaloniki, Greece.
| | - Ali Hasnain
- Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Ireland.
| | - Stefan Decker
- Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Ireland.
| | - Helena F Deus
- Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Ireland.
| |
Collapse
|
43
|
Cokelaer T, Pultz D, Harder LM, Serra-Musach J, Saez-Rodriguez J. BioServices: a common Python package to access biological Web Services programmatically. ACTA ACUST UNITED AC 2013; 29:3241-2. [PMID: 24064416 PMCID: PMC3842755 DOI: 10.1093/bioinformatics/btt547] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Web interfaces provide access to numerous biological databases. Many can be accessed to in a programmatic way thanks to Web Services. Building applications that combine several of them would benefit from a single framework. RESULTS BioServices is a comprehensive Python framework that provides programmatic access to major bioinformatics Web Services (e.g. KEGG, UniProt, BioModels, ChEMBLdb). Wrapping additional Web Services based either on Representational State Transfer or Simple Object Access Protocol/Web Services Description Language technologies is eased by the usage of object-oriented programming. AVAILABILITY AND IMPLEMENTATION BioServices releases and documentation are available at http://pypi.python.org/pypi/bioservices under a GPL-v3 license.
Collapse
Affiliation(s)
- Thomas Cokelaer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense 5230, Denmark, Translational Research Laboratory, Breast Cancer Unit, Catalan Institute of Oncology (ICO), Bellvitge Institute for Biomedical Research (IDIBELL), Gran via 199, L'Hospitalet del Llobregat, Barcelona 08908, Catalonia, Spain and Biomedical Research Institute of Girona, Girona 17007, Catalonia, Spain
| | | | | | | | | |
Collapse
|
44
|
Pipelined data‐flow delegated orchestration for data‐intensive eScience workflows. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2013. [DOI: 10.1108/ijwis-05-2013-0012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeeScience workflows use orchestration for integrating and coordinating distributed and heterogeneous scientific resources, which are increasingly exposed as web services. The rate of growth of scientific data makes eScience workflows data‐intensive, challenging existing workflow solutions. Efficient methods of handling large data in scientific workflows based on web services are needed. The purpse of this paper is to address this issue.Design/methodology/approachIn a previous paper the authors proposed Data‐Flow Delegation (DFD) as a means to optimize orchestrated workflow performance, focusing on SOAP web services. To improve the performance further, they propose pipelined data‐flow delegation (PDFD) for web service‐based eScience workflows in this paper, by leveraging from the domain of parallel programming. Briefly, PDFD allows partitioning of large datasets into independent subsets that can be communicated in a pipelined manner.FindingsThe results show that the PDFD improves the execution time of the workflow considerably and is capable of handling much larger data than the non‐pipelined approach.Practical implicationsExecution of a web service‐based workflow hampered by the size of data can be facilitated or improved by using services supporting Pipelined Data‐Flow Delegation.Originality/valueContributions of this work include the proposed concept of combining pipelining and Data‐Flow Delegation, an XML Schema supporting the PDFD communication between services, and the practical evaluation of the PDFD approach.
Collapse
|
45
|
Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 2013; 29:1325-32. [PMID: 23479348 PMCID: PMC3654706 DOI: 10.1093/bioinformatics/btt113] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. RESULTS EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. AVAILABILITY The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. CONTACT jison@ebi.ac.uk.
Collapse
Affiliation(s)
- Jon Ison
- EMBL European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res 2013; 41:W597-600. [PMID: 23671338 PMCID: PMC3692137 DOI: 10.1093/nar/gkt376] [Citation(s) in RCA: 1184] [Impact Index Per Article: 107.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
Collapse
Affiliation(s)
- Hamish McWilliam
- EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P, Bhagat J, Belhajjame K, Bacall F, Hardisty A, Nieva de la Hidalga A, Balcazar Vargas MP, Sufi S, Goble C. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 2013; 41:W557-61. [PMID: 23640334 PMCID: PMC3692062 DOI: 10.1093/nar/gkt328] [Citation(s) in RCA: 482] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.
Collapse
|
48
|
Korcsmaros T, Dunai ZA, Vellai T, Csermely P. Teaching the bioinformatics of signaling networks: an integrated approach to facilitate multi-disciplinary learning. Brief Bioinform 2013; 14:618-32. [PMID: 23640570 DOI: 10.1093/bib/bbt024] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The number of bioinformatics tools and resources that support molecular and cell biology approaches is continuously expanding. Moreover, systems and network biology analyses are accompanied more and more by integrated bioinformatics methods. Traditional information-centered university teaching methods often fail, as (1) it is impossible to cover all existing approaches in the frame of a single course, and (2) a large segment of the current bioinformation can become obsolete in a few years. Signaling network offers an excellent example for teaching bioinformatics resources and tools, as it is both focused and complex at the same time. Here, we present an outline of a university bioinformatics course with four sample practices to demonstrate how signaling network studies can integrate biochemistry, genetics, cell biology and network sciences. We show that several bioinformatics resources and tools, as well as important concepts and current trends, can also be integrated to signaling network studies. The research-type hands-on experiences we show enable the students to improve key competences such as teamworking, creative and critical thinking and problem solving. Our classroom course curriculum can be re-formulated as an e-learning material or applied as a part of a specific training course. The multi-disciplinary approach and the mosaic setup of the course have the additional benefit to support the advanced teaching of talented students.
Collapse
Affiliation(s)
- Tamas Korcsmaros
- Department of Genetics, Eotvos Lorand University, H-1117 Budapest, Pázmány s. 1/C, Hungary. Tel.: +36302686590;
| | | | | | | |
Collapse
|
49
|
Pérez M, Berlanga R, Sanz I, Aramburu MJ. BioUSeR: a semantic-based tool for retrieving Life Science web resources driven by text-rich user requirements. J Biomed Semantics 2013; 4:12. [PMID: 23635042 PMCID: PMC3698192 DOI: 10.1186/2041-1480-4-12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 04/18/2013] [Indexed: 12/05/2022] Open
Abstract
Background Open metadata registries are a fundamental tool for researchers in the Life Sciences trying to locate resources. While most current registries assume that resources are annotated with well-structured metadata, evidence shows that most of the resource annotations simply consists of informal free text. This reality must be taken into account in order to develop effective techniques for resource discovery in Life Sciences. Results BioUSeR is a semantic-based tool aimed at retrieving Life Sciences resources described in free text. The retrieval process is driven by the user requirements, which consist of a target task and a set of facets of interest, both expressed in free text. BioUSeR is able to effectively exploit the available textual descriptions to find relevant resources by using semantic-aware techniques. Conclusions BioUSeR overcomes the limitations of the current registries thanks to: (i) rich specification of user information needs, (ii) use of semantics to manage textual descriptions, (iii) retrieval and ranking of resources based on user requirements.
Collapse
Affiliation(s)
- María Pérez
- Department of Computer Science and Engineering, Universitat Jaume I, Castellón, Spain.
| | | | | | | |
Collapse
|
50
|
Wollbrett J, Larmande P, de Lamotte F, Ruiz M. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases. BMC Bioinformatics 2013; 14:126. [PMID: 23586394 PMCID: PMC3680174 DOI: 10.1186/1471-2105-14-126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 03/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
Collapse
|