1
|
dbGAPs: A comprehensive database of genes and genetic markers associated with psoriasis and its subtypes. Genomics 2017; 110:S0888-7543(17)30115-5. [PMID: 29031638 DOI: 10.1016/j.ygeno.2017.10.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 10/04/2017] [Accepted: 10/10/2017] [Indexed: 01/12/2023]
Abstract
Psoriasis is a systemic hyperproliferative inflammatory skin disorder, although rarely fatal but significantly reduces quality of life. Understanding the full genetic component of the disease association may provide insight into biological pathways as well as targets and biomarkers for diagnosis, prognosis and therapy. Studies related to psoriasis associated genes and genetic markers are scattered and not easily amendable to data-mining. To alleviate difficulties, we have developed dbGAPs an integrated knowledgebase representing a gateway to psoriasis associated genomic data. The database contains annotation for 202 manually curated genes associated with psoriasis and its subtypes with cross-references. Functional enrichment of these genes, in context of Gene Ontology and pathways, provide insight into their important role in psoriasis etiology and pathogenesis. The dbGAPs interface is enriched with an interactive search engine for data retrieval along with unique customized tools for Single Nucleotide Polymorphism (SNP)/indel detection and SNP/indel annotations. dbGAPs is accessible at http://www.bmicnip.in/dbgaps/.
Collapse
|
2
|
A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining. Comput Biol Med 2017; 89:264-274. [PMID: 28850898 DOI: 10.1016/j.compbiomed.2017.08.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 08/19/2017] [Accepted: 08/20/2017] [Indexed: 12/22/2022]
|
3
|
Using the domain analytical approach in the study of information practices in biomedicine. JOURNAL OF DOCUMENTATION 2016. [DOI: 10.1108/jd-11-2015-0139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this paper is to analyze the information practices of the researchers in biomedicine using the domain analytical approach.
Design/methodology/approach
The domain analytical research approach used in the study of the scientific domain of biomedicine leads to studies into the organization of sciences. By using Whitley’s dimensions of “mutual dependence” and “task uncertainty” in scientific work as a starting point the authors were able to reanalyze previously collected data. By opening up these concepts in the biomedical research work context, the authors analyzed the distinguishing features of the biomedical domain and the way these features affected researchers’ information practices.
Findings
Several indicators representing “task uncertainty” and “mutual dependence” in the scientific domain of biomedicine were identified. This study supports the view that in biomedicine the task uncertainty is low and researchers are mutually highly dependent on each other. Hard competition seems to be one feature, which is behind the explosion of the data and publications in this domain. This fact, on its part is directly related to the ways information is searched, followed, used and produced. The need for new easy to use services or tools for searching and following information in so called “hot” topics came apparent.
Originality/value
The study highlights new information about information practices in the biomedical domain. Whitley’s theory enabled a thorough analysis of the cultural and social nature of the biomedical domain and it proved to be useful in the examination of researchers’ information practices.
Collapse
|
4
|
Abstract
The Database of Human Gastric Cancer (DBGC) is a comprehensive database that integrates various human gastric cancer-related data resources. Human gastric cancer-related transcriptomics projects, proteomics projects, mutations, biomarkers and drug-sensitive genes from different sources were collected and unified in this database. Moreover, epidemiological statistics of gastric cancer patients in China and clinicopathological information annotated with gastric cancer cases were also integrated into the DBGC. We believe that this database will greatly facilitate research regarding human gastric cancer in many fields. DBGC is freely available at http://bminfor.tongji.edu.cn/dbgc/index.do
Collapse
|
5
|
Understanding data sharing behaviors of STEM researchers: The roles of attitudes, norms, and data repositories. LIBRARY & INFORMATION SCIENCE RESEARCH 2015. [DOI: 10.1016/j.lisr.2015.04.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration. PLoS Comput Biol 2015; 11:e1004295. [PMID: 26042810 PMCID: PMC4456145 DOI: 10.1371/journal.pcbi.1004295] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 04/21/2015] [Indexed: 01/18/2023] Open
Abstract
Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provides an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form. Developmental and regenerative biology experiments are producing a huge number of morphological phenotypes from functional perturbation experiments. However, existing pathway models do not generally explain the dynamic regulation of anatomical shape due to the difficulty of inferring and testing non-linear regulatory networks responsible for appropriate form, shape, and pattern. We present a method that automates the discovery and testing of regulatory networks explaining morphological outcomes directly from the resultant phenotypes, producing network models as testable hypotheses explaining regeneration data. Our system integrates a formalization of the published results in planarian regeneration, an in silico simulator in which the patterning properties of regulatory networks can be quantitatively tested in a regeneration assay, and a machine learning module that evolves networks whose behavior in this assay optimally matches the database of planarian results. We applied our method to explain the key experiments in planarian regeneration, and discovered the first comprehensive model of anterior-posterior patterning in planaria under surgical, pharmacological, and genetic manipulations. Beyond the planarian data, our approach is readily generalizable to facilitate the discovery of testable regulatory networks in developmental biology and biomedicine, and represents the first developmental model discovered de novo from morphological outcomes by an automated system.
Collapse
|
7
|
Abstract
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement.
Collapse
|
8
|
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research. Front Neuroinform 2014; 8:54. [PMID: 24904400 PMCID: PMC4033081 DOI: 10.3389/fninf.2014.00054] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Accepted: 04/29/2014] [Indexed: 12/05/2022] Open
Abstract
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction.
Collapse
|
9
|
Automated tracking of quantitative assessments of tumor burden in clinical trials. Transl Oncol 2014; 7:23-35. [PMID: 24772204 DOI: 10.1593/tlo.13796] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Revised: 01/13/2014] [Accepted: 01/15/2014] [Indexed: 11/18/2022] Open
Abstract
THERE ARE TWO KEY CHALLENGES HINDERING EFFECTIVE USE OF QUANTITATIVE ASSESSMENT OF IMAGING IN CANCER RESPONSE ASSESSMENT: 1) Radiologists usually describe the cancer lesions in imaging studies subjectively and sometimes ambiguously, and 2) it is difficult to repurpose imaging data, because lesion measurements are not recorded in a format that permits machine interpretation and interoperability. We have developed a freely available software platform on the basis of open standards, the electronic Physician Annotation Device (ePAD), to tackle these challenges in two ways. First, ePAD facilitates the radiologist in carrying out cancer lesion measurements as part of routine clinical trial image interpretation workflow. Second, ePAD records all image measurements and annotations in a data format that permits repurposing image data for analyses of alternative imaging biomarkers of treatment response. To determine the impact of ePAD on radiologist efficiency in quantitative assessment of imaging studies, a radiologist evaluated computed tomography (CT) imaging studies from 20 subjects having one baseline and three consecutive follow-up imaging studies with and without ePAD. The radiologist made measurements of target lesions in each imaging study using Response Evaluation Criteria in Solid Tumors 1.1 criteria, initially with the aid of ePAD, and then after a 30-day washout period, the exams were reread without ePAD. The mean total time required to review the images and summarize measurements of target lesions was 15% (P < .039) shorter using ePAD than without using this tool. In addition, it was possible to rapidly reanalyze the images to explore lesion cross-sectional area as an alternative imaging biomarker to linear measure. We conclude that ePAD appears promising to potentially improve reader efficiency for quantitative assessment of CT examinations, and it may enable discovery of future novel image-based biomarkers of cancer treatment response.
Collapse
|
10
|
Systems approaches for synthetic biology: a pathway toward mammalian design. Front Physiol 2013; 4:285. [PMID: 24130532 PMCID: PMC3793170 DOI: 10.3389/fphys.2013.00285] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 09/19/2013] [Indexed: 01/08/2023] Open
Abstract
We review methods of understanding cellular interactions through computation in order to guide the synthetic design of mammalian cells for translational applications, such as regenerative medicine and cancer therapies. In doing so, we argue that the challenges of engineering mammalian cells provide a prime opportunity to leverage advances in computational systems biology. We support this claim systematically, by addressing each of the principal challenges to existing synthetic bioengineering approaches—stochasticity, complexity, and scale—with specific methods and paradigms in systems biology. Moreover, we characterize a key set of diverse computational techniques, including agent-based modeling, Bayesian network analysis, graph theory, and Gillespie simulations, with specific utility toward synthetic biology. Lastly, we examine the mammalian applications of synthetic biology for medicine and health, and how computational systems biology can aid in the continued development of these applications.
Collapse
|
11
|
A meta-composite software development approach for translational research. J Med Syst 2013; 37:9935. [PMID: 23504436 PMCID: PMC3634559 DOI: 10.1007/s10916-013-9935-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 02/28/2013] [Indexed: 11/29/2022]
Abstract
Translational researchers conduct research in a highly data-intensive and continuously changing environment and need to use multiple, disparate tools to achieve their goals. These researchers would greatly benefit from meta-composite software development or the ability to continuously compose and recompose tools together in response to their ever-changing needs. However, the available tools are largely disconnected, and current software approaches are inefficient and ineffective in their support for meta-composite software development. Building on the composite services development approach, the de facto standard for developing integrated software systems, we propose a concept-map and agent-based meta-composite software development approach. A crucial step in composite services development is the modeling of users' needs as processes, which can then be specified in an executable format for system composition. We have two key innovations. First, our approach allows researchers (who understand their needs best) instead of technicians to take a leadership role in the development of process models, reducing inefficiencies and errors. A second innovation is that our approach also allows for modeling of complex user interactions as part of the process, overcoming the technical limitations of current tools. We demonstrate the feasibility of our approach using a real-world translational research use case. We also present results of usability studies evaluating our approach for future refinements.
Collapse
|
12
|
MRMPath and MRMutation, Facilitating Discovery of Mass Transitions for Proteotypic Peptides in Biological Pathways Using a Bioinformatics Approach. Adv Bioinformatics 2013; 2013:527295. [PMID: 23424586 PMCID: PMC3570921 DOI: 10.1155/2013/527295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2012] [Revised: 12/20/2012] [Accepted: 12/20/2012] [Indexed: 11/21/2022] Open
Abstract
Quantitative proteomics applications in mass spectrometry depend on the knowledge of the mass-to-charge ratio (m/z) values of proteotypic peptides for the proteins under study and their product ions. MRMPath and MRMutation, web-based bioinformatics software that are platform independent, facilitate the recovery of this information by biologists. MRMPath utilizes publicly available information related to biological pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. All the proteins involved in pathways of interest are recovered and processed in silico to extract information relevant to quantitative mass spectrometry analysis. Peptides may also be subjected to automated BLAST analysis to determine whether they are proteotypic. MRMutation catalogs and makes available, following processing, known (mutant) variants of proteins from the current UniProtKB database. All these results, available via the web from well-maintained, public databases, are written to an Excel spreadsheet, which the user can download and save. MRMPath and MRMutation can be freely accessed. As a system that seeks to allow two or more resources to interoperate, MRMPath represents an advance in bioinformatics tool development. As a practical matter, the MRMPath automated approach represents significant time savings to researchers.
Collapse
|
13
|
People, organizational, and leadership factors impacting informatics support for clinical and translational research. BMC Med Inform Decis Mak 2013; 13:20. [PMID: 23388243 PMCID: PMC3577661 DOI: 10.1186/1472-6947-13-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 01/14/2013] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND In recent years, there have been numerous initiatives undertaken to describe critical information needs related to the collection, management, analysis, and dissemination of data in support of biomedical research (J Investig Med 54:327-333, 2006); (J Am Med Inform Assoc 16:316-327, 2009); (Physiol Genomics 39:131-140, 2009); (J Am Med Inform Assoc 18:354-357, 2011). A common theme spanning such reports has been the importance of understanding and optimizing people, organizational, and leadership factors in order to achieve the promise of efficient and timely research (J Am Med Inform Assoc 15:283-289, 2008). With the emergence of clinical and translational science (CTS) as a national priority in the United States, and the corresponding growth in the scale and scope of CTS research programs, the acuity of such information needs continues to increase (JAMA 289:1278-1287, 2003); (N Engl J Med 353:1621-1623, 2005); (Sci Transl Med 3:90, 2011). At the same time, systematic evaluations of optimal people, organizational, and leadership factors that influence the provision of data, information, and knowledge management technologies and methods are notably lacking. METHODS In response to the preceding gap in knowledge, we have conducted both: 1) a structured survey of domain experts at Academic Health Centers (AHCs); and 2) a subsequent thematic analysis of public-domain documentation provided by those same organizations. The results of these approaches were then used to identify critical factors that may influence access to informatics expertise and resources relevant to the CTS domain. RESULTS A total of 31 domain experts, spanning the Biomedical Informatics (BMI), Computer Science (CS), Information Science (IS), and Information Technology (IT) disciplines participated in a structured surveyprocess. At a high level, respondents identified notable differences in theaccess to BMI, CS, and IT expertise and services depending on the establishment of a formal BMI academic unit and the perceived relationship between BMI, CS, IS, and IT leaders. Subsequent thematic analysis of the aforementioned public domain documents demonstrated a discordance between perceived and reported integration across and between BMI, CS, IS, and IT programs and leaders with relevance to the CTS domain. CONCLUSION Differences in people, organization, and leadership factors do influence the effectiveness of CTS programs, particularly with regard to the ability to access and leverage BMI, CS, IS, and IT expertise and resources. Based on this finding, we believe that the development of a better understanding of how optimal BMI, CS, IS, and IT organizational structures and leadership models are designed and implemented is critical to both the advancement of CTS and ultimately, to improvements in the quality, safety, and effectiveness of healthcare.
Collapse
|
14
|
Translational Researchers’ Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library. JOURNAL OF WEB LIBRARIANSHIP 2012. [DOI: 10.1080/19322909.2012.730375] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
SOA-based digital library services and composition in biomedical applications. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012; 106:219-233. [PMID: 20846740 DOI: 10.1016/j.cmpb.2010.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 07/21/2010] [Accepted: 08/10/2010] [Indexed: 05/29/2023]
Abstract
Carefully collected, high-quality data are crucial in biomedical visualization, and it is important that the user community has ready access to both this data and the high-performance computing resources needed by the complex, computational algorithms that will process it. Biological researchers generally require data, tools and algorithms from multiple providers to achieve their goals. This paper illustrates our response to the problems that result from this. The Living Human Digital Library (LHDL) project presented in this paper has taken advantage of Web Services to build a biomedical digital library infrastructure that allows clinicians and researchers not only to preserve, trace and share data resources, but also to collaborate at the data-processing level.
Collapse
|
16
|
Human Factors in Four Cases of E-Collaboration in Biomedical Research. INTERNATIONAL JOURNAL OF E-COLLABORATION 2012. [DOI: 10.4018/jec.2012040102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
There are compelling arguments for using internet technologies to facilitate research in the biomedical sciences. This project sought to fill a gap in empirical studies of e-collaboration use by biomedical research teams through a study of four cases, based in the research precinct associated with one Australian university, collaborating with international researchers. Researchers were found to hold mixed beliefs and show varying degrees of systematic thinking about how, when and why e-collaboration supported their activities. It appears that researchers need assistance to conceptualise and articulate what works in order to transform their e-collaboration practices to yield increased scientific efficiency and productivity.
Collapse
|
17
|
When humans are the exception: cross-species databases at the interface of biological and clinical research. SOCIAL STUDIES OF SCIENCE 2012; 42:214-36. [PMID: 22848998 DOI: 10.1177/0306312711436265] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Cross-species comparison has long been regarded as a stepping-stone for medical research, enabling the discovery and testing of prospective treatments before they undergo clinical trial on humans. Post-genomic medicine has made cross-species comparison crucial in another respect: the 'community databases' developed to collect and disseminate data on model organisms are now often used as a template for the dissemination of data on humans and as a tool for comparing results of medical significance across the human-animal boundary. This paper identifies and discusses four key problems encountered by database curators when integrating human and non-human data within the same database: (1) picking criteria for what counts as reliable evidence, (2) selecting metadata, (3) standardising and describing research materials and (4) choosing nomenclature to classify data. An analysis of these hurdles reveals epistemic disagreement and controversies underlying cross-species comparisons, which in turn highlight important differences in the experimental cultures of biologists and clinicians trying to make sense of these data. By considering database development through the eyes of curators, this study casts new light on the complex conjunctions of biological and clinical practice, model organisms and human subjects, and material and virtual sources of evidence--thus emphasizing the fragmented, localized and inherently translational nature of biomedicine.
Collapse
|
18
|
Abstract
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.
Collapse
|
19
|
Cyberinfrastructure and the biomedical sciences. Am J Prev Med 2011; 40:S97-102. [PMID: 21521604 PMCID: PMC7020667 DOI: 10.1016/j.amepre.2011.01.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Revised: 01/19/2011] [Accepted: 01/31/2011] [Indexed: 11/19/2022]
|
20
|
Grid-enabled measures: using Science 2.0 to standardize measures and share data. Am J Prev Med 2011; 40:S134-43. [PMID: 21521586 PMCID: PMC3088871 DOI: 10.1016/j.amepre.2011.01.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Revised: 01/20/2011] [Accepted: 01/27/2011] [Indexed: 11/16/2022]
Abstract
Scientists are taking advantage of the Internet and collaborative web technology to accelerate discovery in a massively connected, participative environment--a phenomenon referred to by some as Science 2.0. As a new way of doing science, this phenomenon has the potential to push science forward in a more efficient manner than was previously possible. The Grid-Enabled Measures (GEM) database has been conceptualized as an instantiation of Science 2.0 principles by the National Cancer Institute (NCI) with two overarching goals: (1) promote the use of standardized measures, which are tied to theoretically based constructs; and (2) facilitate the ability to share harmonized data resulting from the use of standardized measures. The first is accomplished by creating an online venue where a virtual community of researchers can collaborate together and come to consensus on measures by rating, commenting on, and viewing meta-data about the measures and associated constructs. The second is accomplished by connecting the constructs and measures to an ontological framework with data standards and common data elements such as the NCI Enterprise Vocabulary System (EVS) and the cancer Data Standards Repository (caDSR). This paper will describe the web 2.0 principles on which the GEM database is based, describe its functionality, and discuss some of the important issues involved with creating the GEM database such as the role of mutually agreed-on ontologies (i.e., knowledge categories and the relationships among these categories--for data sharing).
Collapse
|
21
|
Abstract
We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
Collapse
|
22
|
An informatics project and online "Knowledge Centre" supporting modern genotype-to-phenotype research. Hum Mutat 2011; 32:543-50. [PMID: 21438073 DOI: 10.1002/humu.21469] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 01/28/2011] [Indexed: 11/06/2022]
Abstract
Explosive growth in the generation of genotype-to-phenotype (G2P) data necessitates a concerted effort to tackle the logistical and informatics challenges this presents. The GEN2PHEN Project represents one such effort, with a broad strategy of uniting disparate G2P resources into a hybrid centralized-federated network. This is achieved through a holistic strategy focussed on three overlapping areas: data input standards and pipelines through which to submit and collect data (data in); federated, independent, extendable, yet interoperable database platforms on which to store and curate widely diverse datasets (data storage); and data formats and mechanisms with which to exchange, combine, and extract data (data exchange and output). To fully leverage this data network, we have constructed the "G2P Knowledge Centre" (http://www.gen2phen.org). This central platform provides holistic searching of the G2P data domain allied with facilities for data annotation and user feedback, access to extensive G2P and informatics resources, and tools for constructing online working communities centered on the G2P domain. Through the efforts of GEN2PHEN, and through combining data with broader community-derived knowledge, the Knowledge Centre opens up exciting possibilities for organizing, integrating, sharing, and interpreting new waves of G2P data in a collaborative fashion.
Collapse
|
23
|
Realizing the promise of Web 2.0: engaging community intelligence. JOURNAL OF HEALTH COMMUNICATION 2011; 16 Suppl 1:10-31. [PMID: 21843093 PMCID: PMC3224889 DOI: 10.1080/10810730.2011.589882] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Discussions of Health 2.0, a term first coined in 2005, were guided by three main tenets: (a) health was to involve more participation, because an evolution in the web encouraged more direct consumer engagement in their own health care; (b) data was to become the new "Intel Inside" for systems supporting the vital decisions in health; and (c) a sense of collective intelligence from the network would supplement traditional sources of knowledge in health decision making. Interests in understanding the implications of a new paradigm for patient engagement in health and health care were kindled by findings from surveys such as the National Cancer Institute's Health Information National Trends Survey, showing that patients were quick to look online for information to help them cope with disease. This article considers how these 3 facets of Health 2.0--participation, data, and collective intelligence--can be harnessed to improve the health of the nation according to Healthy People 2020 goals. The authors begin with an examination of evidence from behavioral science to understand how Web 2.0 participative technologies may influence patient processes and outcomes, for better or worse, in an era of changing communication technologies. The article then focuses specifically on the clinical implications of Health 2.0 and offers recommendations to ensure that changes in the communication environment do not detract from national (e.g., Healthy People 2020) health goals. Changes in the clinical environment, as catalyzed by the Health Information Technology for Economic and Clinical Health Act to take advantage of Health 2.0 principles in evidence-based ways, are also considered.
Collapse
|
24
|
|
25
|
Abstract
Advances in digital data acquisition, analysis, and storage have revolutionized the work in many biological disciplines such as genomics, molecular phylogenetics, and structural biology, but have not yet found satisfactory acceptance in morphology. Improvements in non-invasive imaging and three-dimensional visualization techniques, however, permit high-throughput analyses also of whole biological specimens, including museum material. These developments pave the way towards a digital era in morphology. Using sea urchins (Echinodermata: Echinoidea), we provide examples illustrating the power of these techniques. However, remote visualization, the creation of a specialized database, and the implementation of standardized, world-wide accepted data deposition practices prior to publication are essential to cope with the foreseeable exponential increase in digital morphological data.
Collapse
|
26
|
A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2010; 22:1098-1117. [PMID: 20625534 PMCID: PMC2901112 DOI: 10.1002/cpe.1547] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
With the emergence of "service oriented science," the need arises to orchestrate multiple services to facilitate scientific investigation-that is, to create "science workflows." We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community.
Collapse
|
27
|
Bioinformatics: Tools to accelerate population science and disease control research. Am J Prev Med 2010; 38:646-51. [PMID: 20494241 DOI: 10.1016/j.amepre.2010.03.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Revised: 03/04/2010] [Accepted: 03/04/2010] [Indexed: 11/18/2022]
Abstract
Population science and disease control researchers can benefit from a more proactive approach to applying bioinformatics tools for clinical and public health research. Bioinformatics utilizes principles of information sciences and technologies to transform vast, diverse, and complex life sciences data into a more coherent format for wider application. Bioinformatics provides the means to collect and process data, enhance data standardization and harmonization for scientific discovery, and merge disparate data sources. Achieving interoperability (i.e. the development of an informatics system that provides access to and use of data from different systems) will facilitate scientific explorations and careers and opportunities for interventions in population health. The National Cancer Institute's (NCI's) interoperable Cancer Biomedical Informatics Grid (caBIG) is one of a number of illustrative tools in this report that are being mined by population scientists. Tools are not all that is needed for progress. Challenges persist, including a lack of common data standards, proprietary barriers to data access, and difficulties pooling data from studies. Population scientists and informaticists are developing promising and innovative solutions to these barriers. The purpose of this paper is to describe how the application of bioinformatics systems can accelerate population health research across the continuum from prevention to detection, diagnosis, treatment, and outcome.
Collapse
|
28
|
Abstract
With advances in high-throughput techniques, the volume of data generated has resulted in the creation of a plethora of resources for the cancer research community. However, a key factor in the utility, sustainability and future use of a novel resource lies in its ability to allow for data sharing and to be interoperable with major international cancer research efforts. This article will introduce some of these efforts, the interoperable cancer data-mining resources and repositories, from a user-perspective. Some of the considerations to be addressed when building interoperable, sustainable cancer resources will be discussed with case studies-hoping this will prove useful for researchers designing their own cancer databases.
Collapse
|
29
|
Abstract
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.
Collapse
|
30
|
Abstract
The National Cancer Institute Enterprise Vocabulary Services (NCI EVS) uses a wide range of quality assurance (QA) techniques to maintain and extend NCI Thesaurus (NCIt). NCIt is a reference terminology and biomedical ontology used in a growing number of NCI and other systems that extend from translational and basic research through clinical care to public information and administrative activities. Both automated and manual QA techniques are employed throughout the editing and publication cycle, which includes inserting and editing NCIt in NCI Metathesaurus. NCI EVS conducts its own additional periodic and ongoing content QA. External reviews, and extensive evaluation by and interaction with EVS partners and other users, have also played an important part in the QA process. There have always been tensions and compromises between meeting the needs of dependent systems and providing consistent and well-structured content; external QA and feedback have been important in identifying and addressing such issues. Currently, NCI EVS is exploring new approaches to broaden external participation in the terminology development and QA process.
Collapse
|
31
|
Somatic mutation databases as tools for molecular epidemiology and molecular pathology of cancer: proposed guidelines for improving data collection, distribution, and integration. Hum Mutat 2009; 30:275-82. [PMID: 19006239 DOI: 10.1002/humu.20832] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.
Collapse
|
32
|
Owner controlled data exchange in nutrigenomic collaborations: the NuGO information network. GENES AND NUTRITION 2009; 4:113-22. [PMID: 19408032 DOI: 10.1007/s12263-009-0123-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 04/16/2009] [Indexed: 10/20/2022]
Abstract
New 'omics' technologies are changing nutritional sciences research. They enable to tackle increasingly complex questions but also increase the need for collaboration between research groups. An important challenge for successful collaboration is the management and structured exchange of information that accompanies data-intense technologies. NuGO, the European Nutrigenomics Organization, the major collaborating network in molecular nutritional sciences, is supporting the application of modern information technologies in this area. We have developed and implemented a concept for data management and computing infrastructure that supports collaboration between nutrigenomics researchers. The system fills the gap between "private" storing with occasional file sharing by email and the use of centralized databases. It provides flexible tools to share data, also during experiments, while preserving ownership. The NuGO Information Network is a decentral, distributed system for data exchange based on standard web technology. Secure access to data, maintained by the individual researcher, is enabled by web services based on the the BioMoby framework. A central directory provides information about available web services. The flexibility of the infrastructure allows a wide variety of services for data processing and integration by combining several web services, including public services. Therefore, this integrated information system is suited for other research collaborations.
Collapse
|
33
|
Development of the Lymphoma Enterprise Architecture Database: a caBIG Silver level compliant system. Cancer Inform 2009; 8:45-64. [PMID: 19492074 PMCID: PMC2675136 DOI: 10.4137/cin.s940] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Lymphomas are the fifth most common cancer in United States with numerous histological subtypes. Integrating existing clinical information on lymphoma patients provides a platform for understanding biological variability in presentation and treatment response and aids development of novel therapies. We developed a cancer Biomedical Informatics Grid™ (caBIG™) Silver level compliant lymphoma database, called the Lymphoma Enterprise Architecture Data-system™ (LEAD™), which integrates the pathology, pharmacy, laboratory, cancer registry, clinical trials, and clinical data from institutional databases. We utilized the Cancer Common Ontological Representation Environment Software Development Kit (caCORE SDK) provided by National Cancer Institute’s Center for Bioinformatics to establish the LEAD™ platform for data management. The caCORE SDK generated system utilizes an n-tier architecture with open Application Programming Interfaces, controlled vocabularies, and registered metadata to achieve semantic integration across multiple cancer databases. We demonstrated that the data elements and structures within LEAD™ could be used to manage clinical research data from phase 1 clinical trials, cohort studies, and registry data from the Surveillance Epidemiology and End Results database. This work provides a clear example of how semantic technologies from caBIG™ can be applied to support a wide range of clinical and research tasks, and integrate data from disparate systems into a single architecture. This illustrates the central importance of caBIG™ to the management of clinical and biological data.
Collapse
|
34
|
Abstract
Background This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories. Results The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository. Conclusion This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
Collapse
|
35
|
Incorporating collaboratory concepts into informatics in support of translational interdisciplinary biomedical research. Int J Med Inform 2009; 78:10-21. [PMID: 18706852 PMCID: PMC2606933 DOI: 10.1016/j.ijmedinf.2008.06.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Revised: 06/16/2008] [Accepted: 06/30/2008] [Indexed: 10/21/2022]
Abstract
Due to its complex nature, modern biomedical research has become increasingly interdisciplinary and collaborative in nature. Although a necessity, interdisciplinary biomedical collaboration is difficult. There is, however, a growing body of literature on the study and fostering of collaboration in fields such as computer supported cooperative work (CSCW) and information science (IS). These studies of collaboration provide insight into how to potentially alleviate the difficulties of interdisciplinary collaborative research. We, therefore, undertook a cross cutting study of science and engineering collaboratories to identify emergent themes. We review many relevant collaboratory concepts: (a) general collaboratory concepts across many domains: communication, common workspace and coordination, and data sharing and management, (b) specific collaboratory concepts of particular biomedical relevance: data integration and analysis, security structure, metadata and data provenance, and interoperability and data standards, (c) environmental factors that support collaboratories: administrative and management structure, technical support, and available funding as critical environmental factors, and (d) future considerations for biomedical collaboration: appropriate training and long-term planning. In our opinion, the collaboratory concepts we discuss can guide planning and design of future collaborative infrastructure by biomedical informatics researchers to alleviate some of the difficulties of interdisciplinary biomedical collaboration.
Collapse
|
36
|
High performance computing in structural determination by electron cryomicroscopy. J Struct Biol 2008; 164:1-6. [DOI: 10.1016/j.jsb.2008.07.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Revised: 07/04/2008] [Accepted: 07/07/2008] [Indexed: 10/21/2022]
|
37
|
Abstract
Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as "thought in cold storage," and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.
Collapse
|
38
|
Of mice and mentors: developing cyber-infrastructure to support transdisciplinary scientific collaboration. Am J Prev Med 2008; 35:S235-9. [PMID: 18619404 PMCID: PMC2597470 DOI: 10.1016/j.amepre.2008.05.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Revised: 08/21/2008] [Accepted: 05/08/2008] [Indexed: 11/18/2022]
|
39
|
Abstract
The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.
Collapse
|
40
|
|
41
|
A semantic grid infrastructure enabling integrated access and analysis of multilevel biomedical data in support of postgenomic clinical trials on cancer. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2008; 12:205-17. [PMID: 18348950 DOI: 10.1109/titb.2007.903519] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This paper reports on original results of the Advancing Clinico-Genomic Trials on Cancer integrated project focusing on the design and development of a European biomedical grid infrastructure in support of multicentric, postgenomic clinical trials (CTs) on cancer. Postgenomic CTs use multilevel clinical and genomic data and advanced computational analysis and visualization tools to test hypothesis in trying to identify the molecular reasons for a disease and the stratification of patients in terms of treatment. This paper provides a presentation of the needs of users involved in postgenomic CTs, and presents such needs in the form of scenarios, which drive the requirements engineering phase of the project. Subsequently, the initial architecture specified by the project is presented, and its services are classified and discussed. A key set of such services are those used for wrapping heterogeneous clinical trial management systems and other public biological databases. Also, the main technological challenge, i.e. the design and development of semantically rich grid services is discussed. In achieving such an objective, extensive use of ontologies and metadata are required. The Master Ontology on Cancer, developed by the project, is presented, and our approach to develop the required metadata registries, which provide semantically rich information about available data and computational services, is provided. Finally, a short discussion of the work lying ahead is included.
Collapse
|
42
|
DOORS to the Semantic Web and Grid With a PORTAL for Biomedical Computing. ACTA ACUST UNITED AC 2008; 12:191-204. [DOI: 10.1109/titb.2007.905861] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
43
|
The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2008; 25:1251-5. [PMID: 17989687 DOI: 10.1038/nbt1346] [Citation(s) in RCA: 1139] [Impact Index Per Article: 71.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
Collapse
|
44
|
VennMaster: area-proportional Euler diagrams for functional GO analysis of microarrays. BMC Bioinformatics 2008; 9:67. [PMID: 18230172 PMCID: PMC2335321 DOI: 10.1186/1471-2105-9-67] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 01/29/2008] [Indexed: 11/10/2022] Open
Abstract
Background Microarray experiments generate vast amounts of data. The functional context of differentially expressed genes can be assessed by querying the Gene Ontology (GO) database via GoMiner. Directed acyclic graph representations, which are used to depict GO categories enriched with differentially expressed genes, are difficult to interpret and, depending on the particular analysis, may not be well suited for formulating new hypotheses. Additional graphical methods are therefore needed to augment the GO graphical representation. Results We present an alternative visualization approach, area-proportional Euler diagrams, showing set relationships with semi-quantitative size information in a single diagram to support biological hypothesis formulation. The cardinalities of sets and intersection sets are represented by area-proportional Euler diagrams and their corresponding graphical (circular or polygonal) intersection areas. Optimally proportional representations are obtained using swarm and evolutionary optimization algorithms. Conclusion VennMaster's area-proportional Euler diagrams effectively structure and visualize the results of a GO analysis by indicating to what extent flagged genes are shared by different categories. In addition to reducing the complexity of the output, the visualizations facilitate generation of novel hypotheses from the analysis of seemingly unrelated categories that share differentially expressed genes.
Collapse
|
45
|
Abstract
This paper describes the AnnoCryst system-a tool that was designed to enable authenticated collaborators to share online discussions about 3D crystallographic structures through the asynchronous attachment, storage, and retrieval of annotations. Annotations are personal comments, interpretations, questions, assessments, or references that can be attached to files, data, digital objects, or Web pages. The AnnoCryst system enables annotations to be attached to 3D crystallographic models retrieved from either private local repositories (e.g., Fedora) or public online databases (e.g., Protein Data Bank or Inorganic Crystal Structure Database) via a Web browser. The system uses the Jmol plugin for viewing and manipulating the 3D crystal structures but extends Jmol by providing an additional interface through which annotations can be created, attached, stored, searched, browsed, and retrieved. The annotations are stored on a standardized Web annotation server (Annotea), which has been extended to support 3D macromolecular structures. Finally, the system is embedded within a security framework that is capable of authenticating users and restricting access only to trusted colleagues.
Collapse
|
46
|
Open source software projects of the caBIG In Vivo Imaging Workspace Software special interest group. J Digit Imaging 2007; 20 Suppl 1:94-100. [PMID: 17846835 PMCID: PMC2039820 DOI: 10.1007/s10278-007-9061-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Revised: 07/14/2007] [Accepted: 07/16/2007] [Indexed: 10/28/2022] Open
Abstract
The Cancer Bioinformatics Grid (caBIG) program was created by the National Cancer Institute to facilitate sharing of IT infrastructure, data, and applications among the National Cancer Institute-sponsored cancer research centers. The program was launched in February 2004 and now links more than 50 cancer centers. In April 2005, the In Vivo Imaging Workspace was added to promote the use of imaging in cancer clinical trials. At the inaugural meeting, four special interest groups (SIGs) were established. The Software SIG was charged with identifying projects that focus on open-source software for image visualization and analysis. To date, two projects have been defined by the Software SIG. The eXtensible Imaging Platform project has produced a rapid application development environment that researchers may use to create targeted workflows customized for specific research projects. The Algorithm Validation Tools project will provide a set of tools and data structures that will be used to capture measurement information and associated needed to allow a gold standard to be defined for the given database against which change analysis algorithms can be tested. Through these and future efforts, the caBIG In Vivo Imaging Workspace Software SIG endeavors to advance imaging informatics and provide new open-source software tools to advance cancer research.
Collapse
|
47
|
Abstract
Due to the huge volume and complexity of biological data available today, a fundamental component of biomedical research is now in silico analysis. This includes modelling and simulation of biological systems and processes, as well as automated bioinformatics analysis of high-throughput data. The quest for bioinformatics resources (including databases, tools, and knowledge) becomes therefore of extreme importance. Bioinformatics itself is in rapid evolution and dedicated Grid cyberinfrastructures already offer easier access and sharing of resources. Furthermore, the concept of the Grid is progressively interleaving with those of Web Services, semantics, and software agents. Agent-based systems can play a key role in learning, planning, interaction, and coordination. Agents constitute also a natural paradigm to engineer simulations of complex systems like the molecular ones. We present here an agent-based, multilayer architecture for bioinformatics Grids. It is intended to support both the execution of complex in silico experiments and the simulation of biological systems. In the architecture a pivotal role is assigned to an "alive" semantic index of resources, which is also expected to facilitate users' awareness of the bioinformatics domain.
Collapse
|
48
|
Validation of novel optical imaging technologies: the pathologists' view. JOURNAL OF BIOMEDICAL OPTICS 2007; 12:051801. [PMID: 17994879 DOI: 10.1117/1.2795569] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Noninvasive optical imaging technology has the potential to improve the accuracy of disease detection and predict treatment response. Pathology provides the critical link between the biological basis of an image or spectral signature and clinical outcomes obtained through optical imaging. The validation of optical images and spectra requires both morphologic diagnosis from histopathology and parametric analysis of tissue features above and beyond the declared pathologic "diagnosis." Enhancement of optical imaging modalities with exogenously applied biomarkers also requires validation of the biological basis for molecular contrast. For an optical diagnostic or prognostic technology to be useful, it must be clinically important, independently informative, and of demonstrated beneficial value to patient care. Its usage must be standardized with regard to methods, interpretation, reproducibility, and reporting, in which the pathologist plays a key role. By providing insight into disease pathobiology, interpretive or quantitative analysis of tissue material, and expertise in molecular diagnosis, the pathologist should be an integral part of any team that is validating novel optical imaging modalities. This review will consider (1) the selection of validation biomarkers; (2) standardization in tissue processing, diagnosis, reporting, and quantitative analysis; (3) the role of the pathologist in study design; and (4) reference standards, controls, and interobserver variability.
Collapse
|
49
|
Abstract
As clinical trials continue to expand and evolve to include a wider range of collected information, the amount and variety of information available to clinical researchers has concurrently grown. This article describes a range of means to address this complexity and to accommodate the collection, storage, and integration of this information based on current approaches in biomedical informatics. By reviewing these current approaches, and drawing examples from actual practice within the clinical informatics community, a range of potential solutions and their potential impacts are discussed.
Collapse
|
50
|
BAAQ: an infrastructure for application integration and knowledge discovery in bioinformatics. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2007; 11:428-34. [PMID: 17674625 DOI: 10.1109/titb.2006.888700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The emerging grid computing technologies enable bioinformatics scientists to conduct their researches in a virtual laboratory, in which they share public databases, computational tools as well as their analysis workflows. However, the development of grid applications is still a nightmare for general bioinformatics scientists, due to the lack of grid programming environments, standards and high-level services. Here, we present a system, which we named Bioinformatics: Ask Any Questions (BAAQ), to automate this development procedure as much as possible. BAAQ allows scientists to store and manage remote biological data and programs, to build analysis workflows that integrate these resources seamlessly, and to discover knowledge from available resources. This paper addresses two issues in building grid applications in bioinformatics: how to smoothly compose an analysis workflow using heterogeneous resources and how to efficiently discover and re-use available resources in the grid community. Correspondingly an intelligent grid programming environment and an active solution recommendation service are proposed. Finally, we present a case study applying BAAQ to a bioinformatics problem.
Collapse
|