1
|
Chowdhury HMAM, Boult T, Oluwadare O. Comparative study on chromatin loop callers using Hi-C data reveals their effectiveness. BMC Bioinformatics 2024; 25:123. [PMID: 38515011 PMCID: PMC10958853 DOI: 10.1186/s12859-024-05713-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/19/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Chromosome is one of the most fundamental part of cell biology where DNA holds the hierarchical information. DNA compacts its size by forming loops, and these regions house various protein particles, including CTCF, SMC3, H3 histone. Numerous sequencing methods, such as Hi-C, ChIP-seq, and Micro-C, have been developed to investigate these properties. Utilizing these data, scientists have developed a variety of loop prediction techniques that have greatly improved their methods for characterizing loop prediction and related aspects. RESULTS In this study, we categorized 22 loop calling methods and conducted a comprehensive study of 11 of them. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the GM12878 Hi-C datasets at 5 KB, 10 KB, 100 KB and 250 KB resolutions. Our evaluation criteria encompassed various factors, including memory usages, running time, sequencing depth, and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. CONCLUSION This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score ( B C C score ) to measure their overall robustness ensuring a comprehensive evaluation of their performance.
Collapse
Affiliation(s)
- H M A Mohit Chowdhury
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA
| | - Terrance Boult
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA.
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
2
|
Alomair L, Abolfotouh MA. Awareness and Predictors of the Use of Bioinformatics in Genome Research in Saudi Arabia. Int J Gen Med 2023; 16:3413-3425. [PMID: 37587979 PMCID: PMC10426440 DOI: 10.2147/ijgm.s421815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/03/2023] [Indexed: 08/18/2023] Open
Abstract
Background With the advances in genomics research, many countries still need more bioinformatics skills. This study aimed to assess the levels of awareness of bioinformatics and predictors of its use in genomics research among scientists in Saudi Arabia. Methods In a cross-sectional survey, 309 scientists of different biological and biomedical specialties were subjected to a previously validated e-questionnaire to collect data on (1) Knowledge about bioinformatics programming languages and tools, (2) Attitude toward acceptance of bioinformatics resources in genome-related research, and (3) The pattern of information-seeking to online bioinformatics resources. Logistic regression analysis was applied to identify the predictors of using bioinformatics in research. Significance was set at p<0.05. Results More than one-half (248, 56.4%) of all scientists reported a lack of bioinformatics knowledge. Most participants had a neutral attitude toward bioinformatics (295, 95.4%). The barriers facing acceptance of bioinformatics tools reported were; lack of training (210, 67.9%), insufficient support (180, 58.2%), and complexity of software (138, 44.6%). The limited experience was reported in; having one or more bioinformatics tools (98, 31.7%), using a supercomputer in their research inside (44, 14.2%) and outside Saudi Arabia (55, 17.8%), the need for developing a program to solve a biological problem (129, 41.7%), working in one or more fields of bioinformatics (93, 30.1%), using web applications (112, 36.2%), and using programming languages (102, 33.0%). Significant predictors of conducting genomics research were; younger scientists (p=0.039), Ph.D. education (p=0.003), more than five years of experience (p<0.05), previous training (p<0.001), and higher bioinformatics knowledge scores (p<0.001). Conclusion The study revealed a short knowledge, a neutral attitude, a lack of resources, and limited use of bioinformatics resources in genomics research. Education and training during each education level and during the job is recommended. Cloud-based resources may help scientists do research using publicly available Omics data. Further studies are necessary to evaluate collaboration among bioinformatics software developers and biologists.
Collapse
Affiliation(s)
- Lamya Alomair
- AI and Bioinformatics Department, King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard-Health Affairs, Riyadh, Saudi Arabia
- King Saud Bin-Abdulaziz University for Health Sciences (KSAU-HS), Ministry of National Guard-Health Affairs, Riyadh, Saudi Arabia
| | - Mostafa A Abolfotouh
- King Saud Bin-Abdulaziz University for Health Sciences (KSAU-HS), Ministry of National Guard-Health Affairs, Riyadh, Saudi Arabia
- Research Training and Development Section, King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard-Health Affairs, Riyadh, Saudi Arabia
| |
Collapse
|
3
|
Won JI, Park S, Yoon JH, Kim SW. An efficient approach for sequence matching in large DNA databases. J Inf Sci 2016. [DOI: 10.1177/0165551506059229] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In molecular biology, DNA sequence matching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, fast indexes are essential for efficient processing of DNA sequence matching. In this paper, we first point out the problems of the suffix tree, an index structure widely-used for DNA sequence matching, in respect of storage overhead, search performance, and difficulty in seamless integration with DBMS. Then, we propose a new index structure that resolves such problems. The proposed index structure consists of two parts: the primary part realizes the trie as binary bit-string representation without any pointers, and the secondary part helps fast access to the trie's leaf nodes that need to be accessed for post-processing. We also suggest efficient algorithms based on that index for DNA sequence matching. To verify the superiority of the proposed approach, we conduct performance evaluation via a series of experiments. The results reveal that the proposed approach, which requires smaller storage space, can be a few orders of magnitude faster than the suffix tree.
Collapse
Affiliation(s)
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Korea
| | - Jee-Hee Yoon
- Division of Information Engineering and Telecommunications, Hallym University, Korea
| | - Sang-Wook Kim
- College of Information and Communications, Hanyang University, Korea
| |
Collapse
|
4
|
Brown JAL. Evaluating the effectiveness of a practical inquiry-based learning bioinformatics module on undergraduate student engagement and applied skills. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2016; 44:304-13. [PMID: 27161812 DOI: 10.1002/bmb.20954] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 11/20/2015] [Accepted: 12/08/2015] [Indexed: 05/27/2023]
Abstract
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016.
Collapse
Affiliation(s)
- James A L Brown
- Department of Biochemistry, School of Natural Sciences, National University of Ireland Galway, Ireland and Discipline of Surgery, School of Medicine, Lambe Institute for Translational Research, National University of Ireland Galway, Ireland
| |
Collapse
|
5
|
Krajewski P, Chen D, Ćwiek H, van Dijk ADJ, Fiorani F, Kersey P, Klukas C, Lange M, Markiewicz A, Nap JP, van Oeveren J, Pommier C, Scholz U, van Schriek M, Usadel B, Weise S. Towards recommendations for metadata and data handling in plant phenotyping. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:5417-27. [PMID: 26044092 DOI: 10.1093/jxb/erv271] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Recent methodological developments in plant phenotyping, as well as the growing importance of its applications in plant science and breeding, are resulting in a fast accumulation of multidimensional data. There is great potential for expediting both discovery and application if these data are made publicly available for analysis. However, collection and storage of phenotypic observations is not yet sufficiently governed by standards that would ensure interoperability among data providers and precisely link specific phenotypes and associated genomic sequence information. This lack of standards is mainly a result of a large variability of phenotyping protocols, the multitude of phenotypic traits that are measured, and the dependence of these traits on the environment. This paper discusses the current situation of standardization in the area of phenomics, points out the problems and shortages, and presents the areas that would benefit from improvement in this field. In addition, the foundations of the work that could revise the situation are proposed, and practical solutions developed by the authors are introduced.
Collapse
Affiliation(s)
- Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, Poznań, Poland
| | - Dijun Chen
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany
| | - Hanna Ćwiek
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, Poznań, Poland
| | - Aalt D J van Dijk
- Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre (WUR), Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Fabio Fiorani
- Forschungszentrum Jülich, IBG-2 Plant Sciences, Jülich, Germany
| | - Paul Kersey
- The European Molecular Biology Laboratory-The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christian Klukas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany
| | | | - Jan Peter Nap
- Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre (WUR), Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Jan van Oeveren
- Keygene N.V., Agro Business Park 90, 6708 PW Wageningen, The Netherlands
| | | | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany
| | - Marco van Schriek
- Keygene N.V., Agro Business Park 90, 6708 PW Wageningen, The Netherlands
| | - Björn Usadel
- Forschungszentrum Jülich, IBG-2 Plant Sciences, Jülich, Germany RWTH Aachen, Worringer Weg 3, Institute of Biology I, Aachen, Germany
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstrasse 3, D-06466 Stadt Seeland, Germany
| |
Collapse
|
6
|
Abstract
Purpose
– The purpose of this paper is to aim at modelling the trails, which are search patterns with several search systems across the heterogeneous information environment. In addition, the author seeks to examine what kinds of trails occur in routine, semi-complex and complex tasks, and what barrier types occur during the trail-blazing.
Design/methodology/approach
– The author used qualitative task-based approach with shadowing of six molecular medicine researchers during six months, and collected their web interaction logs. Data triangulation made this kind of detailed search system integration analysis possible.
Findings
– Five trail patterns emerged: branches, chains, lists, singles and berrypicking trails. The berrypicking was typical to complex work tasks, whereas the branches were common in routine work tasks. Singles and lists were employed typically in semi-complex tasks. In all kinds of trails, the barriers occurred often during the interaction with a single system, but there was a considerable number of barriers with the malfunctioning system integration, and lacking integration features. The findings propose that the trails could be used to reduce the amount of laborious manual system integration, and that there is a need for support to explorative search process in berrypicking trails.
Originality/value
– Research of information behaviour yielding to different types of search patters with several search systems during real-world work task performance in molecular medicine have not been published previously. The author presents a task-based approach how to model search behaviour patterns. The author discusses the issue of system integration, which is a great challenge in biomedical domain, from the viewpoints of information studies and search behaviour.
Collapse
|
7
|
Guan D, Yuan W, Ma T, Lee S. Detecting potential labeling errors for bioinformatics by multiple voting. Knowl Based Syst 2014. [DOI: 10.1016/j.knosys.2014.04.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Gong P. Dynamic integration of biological data sources using the data concierge. Health Inf Sci Syst 2013; 1:7. [PMID: 25825659 PMCID: PMC4340781 DOI: 10.1186/2047-2501-1-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 09/26/2012] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The ever-changing landscape of large-scale network environments and innovative biology technologies require dynamic mechanisms to rapidly integrate previously unknown bioinformatics sources at runtime. However, existing integration technologies lack sufficient flexibility to adapt to these changes, because the techniques used for integration are static, and sensitive to new or changing bioinformatics source implementations and evolutionary biologist requirements. METHODS To address this challenge, in this paper we propose a new semantics-based adaptive middleware, the Data Concierge, which is able to dynamically integrate heterogeneous biological data sources without the need for wrappers. Along with the architecture necessary to facilitate dynamic integration, API description mechanism is proposed to dynamically classify, recognize, locate, and invoke newly added biological data source functionalities. Based on the unified semantic metadata, XML-based state machines are able to provide flexible configurations to execute biologist's abstract and complex operations. RESULTS AND DISCUSSION Experimental results demonstrate that for obtaining dynamic features, the Data Concierge sacrifices reasonable performance on reasoning knowledge models and dynamically doing data source API invocations. The overall costs to integrate new biological data sources are significantly lower when using the Data Concierge. CONCLUSIONS The Data Concierge facilitates the rapid integration of new biological data sources in existing applications with no repetitive software development required, and hence, this mechanism would provide a cost-effective solution to the labor-intensive software engineering tasks.
Collapse
Affiliation(s)
- Peng Gong
- Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information Technologies, the University of Sydney, Sydney, NSW 2006 Australia
- Department of PET and Nuclear Medicine, RPA Hospital, Camperdown, NSW 2050 Australia
| |
Collapse
|
9
|
Ram S, Laxman Rao N. iBIRA – integrated bioinformatics information resource access. REFERENCE SERVICES REVIEW 2012. [DOI: 10.1108/00907321211228354] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeBioinformatics is an emerging discipline where the interdisciplinary research holds great promise for the advancement of research and development in many complex areas. The research output generates a huge amount of data and information. Because of the heterogeneous nature of bioinformatics resources, difficulty in accessing pertinent information is the biggest challenge for the bioinformatics community. The integration of bioinformatics resources in a comprehensive manner is advocated by the bioinformatics user community as well as by information scientists serving this community. There are have already been some efforts made for integration of bioinformatics resources by the discrete bioinformatics community, but these are based on the requirement of their own area and arena. This paper aims to discuss the design and development of a tool for the integration of various heterogeneous bioinformatics information resources available over the internet.Design/methodology/approachThe authors have developed a tool with the acronym “iBIRA” (Integrated Bioinformatics Information Resource Access) that associates the bioinformatics community with the bioinformatics “resourceome” (the term suggested for the “full set of bioinformatics resources” by Cannata et al.). Available over the internet. iBIRA (www.ibiranet.in) integrates bioinformatics resources in a way such that it is possible to locate, connect and communicate different categories of resources in a cohesive manner. A software engineering and database‐driven approach was used for the integration and organization of bioinformatics resources. Computational programming such as Hypertext Preprocessor (PHP), a server‐side dynamic web programming language, and MySQL as a database management system have been used. Dublin Core Metadata Standards have been used for the design of metadata for bioinformatics resources..FindingsThe term “resource” in the area of bioinformatics covers various entities such as journals, molecular biology databases, online annotation tools, patents, published documents (articles, books, etc), protocols, software tools, and web servers. It has been found that bioinformatics resources are heterogeneous in nature and available over the internet in different forms and formats. The fact that bioinformatics resources are scattered over the internet makes resource discovery difficult for the bioinformatics community, and there is need for a system that reorganizes these resources. The integration of all the resources of bioinformatics at a single platform (called “iBIRA”) provides significant “value added” to the bioinformatics community, those serving this population.Originality/valueThe iBIRA tool is a meta‐server developed to provide information service about the availability of various bioinformatics resources to the bioinformatics community. This will provide a value‐added benefit to the population in helping them to locate relevant resources for their education, research and training.
Collapse
|
10
|
A semantic approach for the requirement-driven discovery of web resources in the Life Sciences. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0498-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
11
|
Jupp S, Stevens R, Hoehndorf R. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL. J Biomed Semantics 2012; 3 Suppl 1:S3. [PMID: 22541594 PMCID: PMC3337258 DOI: 10.1186/2041-1480-3-s1-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. RESULTS To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. CONCLUSION This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse. AVAILABILITY The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
12
|
Galvez C, de Moya‐Anegón F. A dictionary‐based approach to normalizing gene names in one domain of knowledge from the biomedical literature. JOURNAL OF DOCUMENTATION 2012. [DOI: 10.1108/00220411211200301] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
13
|
|
14
|
Cohen-Boulakia S, Davidson S, Froidevaux C, Lacroix Z, Vidal ME. PATH-BASED SYSTEMS TO GUIDE SCIENTISTS IN THE MAZE OF BIOLOGICAL DATA SOURCES. J Bioinform Comput Biol 2011; 4:1069-95. [PMID: 17099942 DOI: 10.1142/s0219720006002375] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2006] [Revised: 07/11/2006] [Accepted: 07/11/2006] [Indexed: 11/18/2022]
Abstract
Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and create knowledge. Today, numerous biological questions can be answered without entering a wet lab. Scientific protocols designed to answer these questions can be run entirely on a computer. Biological resources are often complementary, focused on different objects and reflecting various experts' points of view. Exploiting the richness and diversity of these resources is crucial for scientists. However, with the increase of resources, scientists have to face the problem of selecting sources and tools when interpreting their data. In this paper, we analyze the way in which biologists express and implement scientific protocols, and we identify the requirements for a system which can guide scientists in constructing protocols to answer new biological questions. We present two such systems, BioNavigation and BioGuide dedicated to help scientists select resources by following suitable paths within the growing network of interconnected biological resources.
Collapse
Affiliation(s)
- Sarah Cohen-Boulakia
- Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut St, Philadelphia, PA 19104, USA.
| | | | | | | | | |
Collapse
|
15
|
Triplet T, Shortridge MD, Griep MA, Stark JL, Powers R, Revesz P. PROFESS: a PROtein function, evolution, structure and sequence database. Database (Oxford) 2010; 2010:baq011. [PMID: 20624718 PMCID: PMC2911846 DOI: 10.1093/database/baq011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Revised: 06/03/2010] [Accepted: 06/06/2010] [Indexed: 11/13/2022]
Abstract
The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are approximately 1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein-protein interaction networks. Database URL: http://cse.unl.edu/~profess/
Collapse
Affiliation(s)
- Thomas Triplet
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Matthew D. Shortridge
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Mark A. Griep
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Jaime L. Stark
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Robert Powers
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Peter Revesz
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| |
Collapse
|
16
|
Mirel B. Supporting cognition in systems biology analysis: findings on users' processes and design implications. JOURNAL OF BIOMEDICAL DISCOVERY AND COLLABORATION 2009; 4:2. [PMID: 19216777 PMCID: PMC2649900 DOI: 10.1186/1747-5333-4-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2008] [Accepted: 02/13/2009] [Indexed: 01/19/2023]
Abstract
BACKGROUND Current usability studies of bioinformatics tools suggest that tools for exploratory analysis support some tasks related to finding relationships of interest but not the deep causal insights necessary for formulating plausible and credible hypotheses. To better understand design requirements for gaining these causal insights in systems biology analyses a longitudinal field study of 15 biomedical researchers was conducted. Researchers interacted with the same protein-protein interaction tools to discover possible disease mechanisms for further experimentation. RESULTS Findings reveal patterns in scientists' exploratory and explanatory analysis and reveal that tools positively supported a number of well-structured query and analysis tasks. But for several of scientists' more complex, higher order ways of knowing and reasoning the tools did not offer adequate support. Results show that for a better fit with scientists' cognition for exploratory analysis systems biology tools need to better match scientists' processes for validating, for making a transition from classification to model-based reasoning, and for engaging in causal mental modelling. CONCLUSION As the next great frontier in bioinformatics usability, tool designs for exploratory systems biology analysis need to move beyond the successes already achieved in supporting formulaic query and analysis tasks and now reduce current mismatches with several of scientists' higher order analytical practices. The implications of results for tool designs are discussed.
Collapse
Affiliation(s)
- Barbara Mirel
- School of Education, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
17
|
Bolchini D, Finkelstein A, Perrone V, Nagl S. Better bioinformatics through usability analysis. ACTA ACUST UNITED AC 2008; 25:406-12. [PMID: 19073592 DOI: 10.1093/bioinformatics/btn633] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a satisfactory user experience and force researchers to spend unnecessary time and effort to complete their tasks. The number of online biological databases available is growing and there is an expanding community of diverse users. In this context there is an increasing need to ensure the highest standards of usability. RESULTS Using 'state-of-the-art' usability evaluation methods, we have identified and characterized a sample of usability issues potentially relevant to web bioinformatics resources, in general. These specifically concern the design of the navigation and search mechanisms available to the user. The usability issues we have discovered in our substantial case studies are undermining the ability of users to find the information they need in their daily research activities. In addition to characterizing these issues, specific recommendations for improvements are proposed leveraging proven practices from web and usability engineering. The methods and approach we exemplify can be readily adopted by the developers of bioinformatics resources.
Collapse
Affiliation(s)
- Davide Bolchini
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| | | | | | | |
Collapse
|
18
|
Goble C, Stevens R, Hull D, Wolstencroft K, Lopez R. Data curation + process curation=data integration + science. Brief Bioinform 2008; 9:506-17. [PMID: 19060304 DOI: 10.1093/bib/bbn034] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Programmatic access to services, for data and processes, means that compositions of services can be made that represent the in silico experiments or processes that bioinformaticians perform. Data integration through workflows depends on being able to know what services exist and where to find those services. The large number of services and the operations they perform, their arbitrary naming and lack of documentation, however, mean that they can be difficult to use. The workflows themselves are composite processes that could be pooled and reused but only if they too can be found and understood. Thus appropriate curation, including semantic mark-up, would enable processes to be found, maintained and consequently used more easily. This broader view on semantic annotation is vital for full data integration that is necessary for the modern scientific analyses in biology. This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.
Collapse
Affiliation(s)
- Carole Goble
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | | | | | | | | |
Collapse
|
19
|
Qu C, Zimmermann F, Kumpf K, Kamuzinzi R, Ledent V, Herzog R. Semantics-enabled service discovery framework in the SIMDAT pharma grid. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2008; 12:182-190. [PMID: 18348948 DOI: 10.1109/titb.2007.907987] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We present the design and implementation of a semantics-enabled service discovery framework in the data Grids for process and product development using numerical simulation and knowledge discovery (SIMDAT) Pharma Grid, an industry-oriented Grid environment for integrating thousands of Grid-enabled biological data services and analysis services. The framework consists of three major components: the Web ontology language (OWL)-description logic (DL)-based biological domain ontology, OWL Web service ontology (OWL-S)-based service annotation, and semantic matchmaker based on the ontology reasoning. Built upon the framework, workflow technologies are extensively exploited in the SIMDAT to assist biologists in (semi)automatically performing in silico experiments. We present a typical usage scenario through the case study of a biological workflow: IXodus.
Collapse
Affiliation(s)
- Cangtao Qu
- IT Research Division, NEC Laboratories, Europe, NEC Europe Ltd., D-53757 Sankt Augustin, Germany.
| | | | | | | | | | | |
Collapse
|
20
|
Hidders J, Kwasnikowska N, Sroka J, Tyszkiewicz J, Van den Bussche J. A Formal Model of Dataflow Repositories. LECTURE NOTES IN COMPUTER SCIENCE 2007. [DOI: 10.1007/978-3-540-73255-6_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
21
|
Saraiya P, North C, Lam V, Duca KA. An insight-based longitudinal study of visual analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2006; 12:1511-22. [PMID: 17073373 DOI: 10.1109/tvcg.2006.85] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Visualization tools are typically evaluated in controlled studies that observe the short-term usage of these tools by participants on preselected data sets and benchmark tasks. Though such studies provide useful suggestions, they miss the long-term usage of the tools. A longitudinal study of a bioinformatics data set analysis is reported here. The main focus of this work is to capture the entire analysis process that an analyst goes through from a raw data set to the insights sought from the data. The study provides interesting observations about the use of visual representations and interaction mechanisms provided by the tools, and also about the process of insight generation in general. This deepens our understanding of visual analytics, guides visualization developers in creating more effective visualization tools in terms of user requirements, and guides evaluators in designing future studies that are more representative of insights sought by users from their data sets.
Collapse
Affiliation(s)
- Purvi Saraiya
- Department of Computer Science, Virginia Tech, Blacksburg 24061-0106, USA.
| | | | | | | |
Collapse
|
22
|
Philippi S, Köhler J. Automated structure extraction and XML conversion of life science database flat files. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2006; 10:714-21. [PMID: 17044405 DOI: 10.1109/titb.2006.875653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
In the light of the increasing number of biological databases, their integration is a fundamental prerequisite for answering complex biological questions. Database integration, therefore, is an important area of research in bioinformatics. Since most of the publicly available life science databases are still exclusively exchanged by means of proprietary flat files, database integration requires parsers for very different flat file formats. Unfortunately, the development and maintenance of database specific flat file parsers is a nontrivial and time-consuming task, which takes considerable effort in large-scale integration scenarios. This paper introduces heuristically based concepts for automatic structure extraction from life science database flat files. On the basis of these concepts the FlatEx prototype is developed for the automatic conversion of flat files into XML representations.
Collapse
|
23
|
Abstract
Web services provide a standard way of publishing applications and data sources over the internet, enabling mass dissemination of knowledge. In the life sciences, the web-service approach is seen as being a road to standardizing the multitude of tools available from different providers. In this article, we present an overview of the technology (focusing on life-science applications), we list the currently available service providers and we discuss advanced issues raised by the concept.
Collapse
Affiliation(s)
- Vasa Curcin
- Department of Computing, Imperial College, 180 Queen's Gate, London, UK, SW7 2AZ.
| | | | | |
Collapse
|
24
|
Gutiérrez RA, Shasha DE, Coruzzi GM. Systems biology for the virtual plant. PLANT PHYSIOLOGY 2005; 138:550-4. [PMID: 15955912 PMCID: PMC1150368 DOI: 10.1104/pp.104.900150] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
|
25
|
Garcia Castro A, Thoraval S, Garcia LJ, Ragan MA. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics 2005; 6:87. [PMID: 15813976 PMCID: PMC1090554 DOI: 10.1186/1471-2105-6-87] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Accepted: 04/07/2005] [Indexed: 11/14/2022] Open
Abstract
Background Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. Results We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability: (interactive), (download). Conclusion From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Collapse
Affiliation(s)
- Alexander Garcia Castro
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Qld 4072, Australia
- Australian Research Council (ARC) Centre in Bioinformatics, Australia
| | - Samuel Thoraval
- Australian Research Council (ARC) Centre in Bioinformatics, Australia
- LIBROPHYT, Bioinformatique, Centre de Cadarache, Bâtiment 185, DEVM, 13108 St Paul-Lez-Durance, France
| | | | - Mark A Ragan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Qld 4072, Australia
- Australian Research Council (ARC) Centre in Bioinformatics, Australia
| |
Collapse
|
26
|
Dadzie AS, Burger A. Providing visualisation support for the analysis of anatomy ontology data. BMC Bioinformatics 2005; 6:74. [PMID: 15790390 PMCID: PMC1087473 DOI: 10.1186/1471-2105-6-74] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2004] [Accepted: 03/24/2005] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Improvements in technology have been accompanied by the generation of large amounts of complex data. This same technology must be harnessed effectively if the knowledge stored within the data is to be retrieved. Storing data in ontologies aids its management; ontologies serve as controlled vocabularies that promote data exchange and re-use, improving analysis. The Edinburgh Mouse Atlas Project stores the developmental stages of the mouse embryo in anatomy ontologies. This project is looking at the use of visual data overviews for intuitive analysis of the ontology data. RESULTS A prototype has been developed that visualises the ontologies using directed acyclic graphs in two dimensions, with the ability to study detail in regions of interest in isolation or within the context of the overview. This is followed by the development of a technique that layers individual anatomy ontologies in three-dimensional space, so that relationships across multiple data sets may be mapped using physical links drawn along the third axis. CONCLUSION Usability evaluations of the applications confirmed advantages in visual analysis of complex data. This project will look next at data input from multiple sources, and continue to develop the techniques presented to provide intuitive identification of relationships that span multiple ontologies.
Collapse
Affiliation(s)
- Aba-Sah Dadzie
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, Scotland
| | - Albert Burger
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, Scotland
- Medical Research Council, Human Genetics Unit, Western General Hospital, Edinburgh EH4 2XU, Scotland
| |
Collapse
|
27
|
|
28
|
MacMullen WJ, Denn SO. Information problems in molecular biology and bioinformatics. ACTA ACUST UNITED AC 2005. [DOI: 10.1002/asi.20134] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
29
|
|
30
|
Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11431053_2] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
31
|
Philippi S, Köhler J. Using XML Technology for the Ontology-Based Semantic Integration of Life Science Databases. ACTA ACUST UNITED AC 2004; 8:154-60. [PMID: 15217260 DOI: 10.1109/titb.2004.826724] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Several hundred internet accessible life science databases with constantly growing contents and varying areas of specialization are publicly available via the internet. Database integration, consequently, is a fundamental prerequisite to be able to answer complex biological questions. Due to the presence of syntactic, schematic, and semantic heterogeneities, large scale database integration at present takes considerable efforts. As there is a growing apprehension of extensible markup language (XML) as a means for data exchange in the life sciences, this article focuses on the impact of XML technology on database integration in this area. In detail, a general architecture for ontology-driven data integration based on XML technology is introduced, which overcomes some of the traditional problems in this area. As a proof of concept, a prototypical implementation of this architecture based on a native XML database and an expert system shell is described for the realization of a real world integration scenario.
Collapse
|
32
|
|
33
|
Applying Semantic Web Services to Bioinformatics: Experiences Gained, Lessons Learnt. THE SEMANTIC WEB – ISWC 2004 2004. [DOI: 10.1007/978-3-540-30475-3_25] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
34
|
Hoon S, Ratnapu KK, Chia JM, Kumarasamy B, Juguang X, Clamp M, Stabenau A, Potter S, Clarke L, Stupka E. Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res 2003; 13:1904-15. [PMID: 12869579 PMCID: PMC403782 DOI: 10.1101/gr.1363103] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.
Collapse
Affiliation(s)
- Shawn Hoon
- Institute of Molecular and Cell Biology, National University of Singapore, Singapore 117609
| | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
|
36
|
Stevens R, Goble C, Paton NW, Bechhofer S, Ng G, Baker P, Brass A. Complex Query Formulation Over Diverse Information Sources in TAMBIS. Bioinformatics 2003. [DOI: 10.1016/b978-155860829-0/50009-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023] Open
|
37
|
Issues to Address While Designing a Biological Information System. Bioinformatics 2003. [DOI: 10.1016/b978-155860829-0/50006-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|