1
|
A practical guide to data management and sharing for biomedical laboratory researchers. Exp Neurol 2024; 378:114815. [PMID: 38762093 DOI: 10.1016/j.expneurol.2024.114815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 05/13/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Effective data management and sharing have become increasingly crucial in biomedical research; however, many laboratory researchers lack the necessary tools and knowledge to address this challenge. This article provides an introductory guide into research data management (RDM), and the importance of FAIR (Findable, Accessible, Interoperable, and Reusable) data-sharing principles for laboratory researchers produced by practicing scientists. We explore the advantages of implementing organized data management strategies and introduce key concepts such as data standards, data documentation, and the distinction between machine and human-readable data formats. Furthermore, we offer practical guidance for creating a data management plan and establishing efficient data workflows within the laboratory setting, suitable for labs of all sizes. This includes an examination of requirements analysis, the development of a data dictionary for routine data elements, the implementation of unique subject identifiers, and the formulation of standard operating procedures (SOPs) for seamless data flow. To aid researchers in implementing these practices, we present a simple organizational system as an illustrative example, which can be tailored to suit individual needs and research requirements. By presenting a user-friendly approach, this guide serves as an introduction to the field of RDM and offers practical tips to help researchers effortlessly meet the common data management and sharing mandates rapidly becoming prevalent in biomedical research.
Collapse
|
2
|
NIDM-Terms: community-based terminology management for improved neuroimaging dataset descriptions and query. Front Neuroinform 2023; 17:1174156. [PMID: 37533796 PMCID: PMC10392125 DOI: 10.3389/fninf.2023.1174156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 06/27/2023] [Indexed: 08/04/2023] Open
Abstract
The biomedical research community is motivated to share and reuse data from studies and projects by funding agencies and publishers. Effectively combining and reusing neuroimaging data from publicly available datasets, requires the capability to query across datasets in order to identify cohorts that match both neuroimaging and clinical/behavioral data criteria. Critical barriers to operationalizing such queries include, in part, the broad use of undefined study variables with limited or no annotations that make it difficult to understand the data available without significant interaction with the original authors. Using the Brain Imaging Data Structure (BIDS) to organize neuroimaging data has made querying across studies for specific image types possible at scale. However, in BIDS, beyond file naming and tightly controlled imaging directory structures, there are very few constraints on ancillary variable naming/meaning or experiment-specific metadata. In this work, we present NIDM-Terms, a set of user-friendly terminology management tools and associated software to better manage individual lab terminologies and help with annotating BIDS datasets. Using these tools to annotate BIDS data with a Neuroimaging Data Model (NIDM) semantic web representation, enables queries across datasets to identify cohorts with specific neuroimaging and clinical/behavioral measurements. This manuscript describes the overall informatics structures and demonstrates the use of tools to annotate BIDS datasets to perform integrated cross-cohort queries.
Collapse
|
3
|
Toxicology knowledge graph for structural birth defects. COMMUNICATIONS MEDICINE 2023; 3:98. [PMID: 37460679 DOI: 10.1038/s43856-023-00329-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 06/29/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.
Collapse
|
4
|
Extending and using anatomical vocabularies in the stimulating peripheral activity to relieve conditions project. Front Neuroinform 2022; 16:819198. [PMID: 36090663 PMCID: PMC9449460 DOI: 10.3389/fninf.2022.819198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 07/18/2022] [Indexed: 11/25/2022] Open
Abstract
The stimulating peripheral activity to relieve conditions (SPARC) program is a US National Institutes of Health-funded effort to improve our understanding of the neural circuitry of the autonomic nervous system (ANS) in support of bioelectronic medicine. As part of this effort, the SPARC project is generating multi-species, multimodal data, models, simulations, and anatomical maps supported by a comprehensive knowledge base of autonomic circuitry. To facilitate the organization of and integration across multi-faceted SPARC data and models, SPARC is implementing the findable, accessible, interoperable, and reusable (FAIR) data principles to ensure that all SPARC products are findable, accessible, interoperable, and reusable. We are therefore annotating and describing all products with a common FAIR vocabulary. The SPARC Vocabulary is built from a set of community ontologies covering major domains relevant to SPARC, including anatomy, physiology, experimental techniques, and molecules. The SPARC Vocabulary is incorporated into tools researchers use to segment and annotate their data, facilitating the application of these ontologies for annotation of research data. However, since investigators perform deep annotations on experimental data, not all terms and relationships are available in community ontologies. We therefore implemented a term management and vocabulary extension pipeline where SPARC researchers may extend the SPARC Vocabulary using InterLex, an online vocabulary management system. To ensure the quality of contributed terms, we have set up a curated term request and review pipeline specifically for anatomical terms involving expert review. Accepted terms are added to the SPARC Vocabulary and, when appropriate, contributed back to community ontologies to enhance ANS coverage. Here, we provide an overview of the SPARC Vocabulary, the infrastructure and process for implementing the term management and review pipeline. In an analysis of >300 anatomical contributed terms, the majority represented composite terms that necessitated combining terms within and across existing ontologies. Although these terms are not good candidates for community ontologies, they can be linked to structures contained within these ontologies. We conclude that the term request pipeline serves as a useful adjunct to community ontologies for annotating experimental data and increases the FAIRness of SPARC data.
Collapse
|
5
|
Transcriptional regulatory networks of circulating immune cells in type 1 diabetes: A community knowledgebase. iScience 2022; 25:104581. [PMID: 35832893 PMCID: PMC9272393 DOI: 10.1016/j.isci.2022.104581] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 06/01/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Investigator-generated transcriptomic datasets interrogating circulating immune cell (CIC) gene expression in clinical type 1 diabetes (T1D) have underappreciated re-use value. Here, we repurposed these datasets to create an open science environment for the generation of hypotheses around CIC signaling pathways whose gain or loss of function contributes to T1D pathogenesis. We firstly computed sets of genes that were preferentially induced or repressed in T1D CICs and validated these against community benchmarks. We then inferred and validated signaling node networks regulating expression of these gene sets, as well as differentially expressed genes in the original underlying T1D case:control datasets. In a set of three use cases, we demonstrated how informed integration of these networks with complementary digital resources supports substantive, actionable hypotheses around signaling pathway dysfunction in T1D CICs. Finally, we developed a federated, cloud-based web resource that exposes the entire data matrix for unrestricted access and re-use by the research community. Re-use of transcriptomic type 1 diabetes (T1D) circulating immune cells (CICs) datasets We generated transcriptional regulatory networks for T1D CICs Use cases generate substantive hypotheses around signaling pathway dysfunction in T1D CICs Networks are freely accessible on the web for re-use by the research community
Collapse
|
6
|
dkNET Hypothesis Center: A Hub for FAIR Data, Online Resources and Hypothesis Generation. FASEB J 2022. [DOI: 10.1096/fasebj.2022.36.s1.r5782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
7
|
Empowering Data Sharing and Analytics through the Open Data Commons for Traumatic Brain Injury Research. Neurotrauma Rep 2022; 3:139-157. [PMID: 35403104 PMCID: PMC8985540 DOI: 10.1089/neur.2021.0061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Traumatic brain injury (TBI) is a major public health problem. Despite considerable research deciphering injury pathophysiology, precision therapies remain elusive. Here, we present large-scale data sharing and machine intelligence approaches to leverage TBI complexity. The Open Data Commons for TBI (ODC-TBI) is a community-centered repository emphasizing Findable, Accessible, Interoperable, and Reusable data sharing and publication with persistent identifiers. Importantly, the ODC-TBI implements data sharing of individual subject data, enabling pooling for high-sample-size, feature-rich data sets for machine learning analytics. We demonstrate pooled ODC-TBI data analyses, starting with descriptive analytics of subject-level data from 11 previously published articles (N = 1250 subjects) representing six distinct pre-clinical TBI models. Second, we perform unsupervised machine learning on multi-cohort data to identify persistent inflammatory patterns across different studies, improving experimental sensitivity for pro- versus anti-inflammation effects. As funders and journals increasingly mandate open data practices, ODC-TBI will create new scientific opportunities for researchers and facilitate multi-data-set, multi-dimensional analytics toward effective translation.
Collapse
|
8
|
Abstract
In this perspective article, we consider the critical issue of data and other research object standardisation and, specifically, how international collaboration, and organizations such as the International Neuroinformatics Coordinating Facility (INCF) can encourage that emerging neuroscience data be Findable, Accessible, Interoperable, and Reusable (FAIR). As neuroscientists engaged in the sharing and integration of multi-modal and multiscale data, we see the current insufficiency of standards as a major impediment in the Interoperability and Reusability of research results. We call for increased international collaborative standardisation of neuroscience data to foster integration and efficient reuse of research objects.
Collapse
|
9
|
A Standards Organization for Open and FAIR Neuroscience: the International Neuroinformatics Coordinating Facility. Neuroinformatics 2022; 20:25-36. [PMID: 33506383 PMCID: PMC9036053 DOI: 10.1007/s12021-020-09509-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/28/2020] [Indexed: 01/07/2023]
Abstract
There is great need for coordination around standards and best practices in neuroscience to support efforts to make neuroscience a data-centric discipline. Major brain initiatives launched around the world are poised to generate huge stores of neuroscience data. At the same time, neuroscience, like many domains in biomedicine, is confronting the issues of transparency, rigor, and reproducibility. Widely used, validated standards and best practices are key to addressing the challenges in both big and small data science, as they are essential for integrating diverse data and for developing a robust, effective, and sustainable infrastructure to support open and reproducible neuroscience. However, developing community standards and gaining their adoption is difficult. The current landscape is characterized both by a lack of robust, validated standards and a plethora of overlapping, underdeveloped, untested and underutilized standards and best practices. The International Neuroinformatics Coordinating Facility (INCF), an independent organization dedicated to promoting data sharing through the coordination of infrastructure and standards, has recently implemented a formal procedure for evaluating and endorsing community standards and best practices in support of the FAIR principles. By formally serving as a standards organization dedicated to open and FAIR neuroscience, INCF helps evaluate, promulgate, and coordinate standards and best practices across neuroscience. Here, we provide an overview of the process and discuss how neuroscience can benefit from having a dedicated standards body.
Collapse
|
10
|
The SPARC DRC: Building a Resource for the Autonomic Nervous System Community. Front Physiol 2021; 12:693735. [PMID: 34248680 PMCID: PMC8265045 DOI: 10.3389/fphys.2021.693735] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 05/28/2021] [Indexed: 02/01/2023] Open
Abstract
The Data and Resource Center (DRC) of the NIH-funded SPARC program is developing databases, connectivity maps, and simulation tools for the mammalian autonomic nervous system. The experimental data and mathematical models supplied to the DRC by the SPARC consortium are curated, annotated and semantically linked via a single knowledgebase. A data portal has been developed that allows discovery of data and models both via semantic search and via an interface that includes Google Map-like 2D flatmaps for displaying connectivity, and 3D anatomical organ scaffolds that provide a common coordinate framework for cross-species comparisons. We discuss examples that illustrate the data pipeline, which includes data upload, curation, segmentation (for image data), registration against the flatmaps and scaffolds, and finally display via the web portal, including the link to freely available online computational facilities that will enable neuromodulation hypotheses to be investigated by the autonomic neuroscience community and device manufacturers.
Collapse
|
11
|
Antibody Watch: Text mining antibody specificity from the literature. PLoS Comput Biol 2021; 17:e1008967. [PMID: 34043624 PMCID: PMC8189493 DOI: 10.1371/journal.pcbi.1008967] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 06/09/2021] [Accepted: 04/15/2021] [Indexed: 11/21/2022] Open
Abstract
Antibodies are widely used reagents to test for expression of proteins and other antigens. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, it is still challenging to cover the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating alerts to users of problematic antibodies by extracting statements about antibody specificity reported in the literature. The extracted alerts can be used to construct an “Antibody Watch” knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits non-specificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform the classification task with 0.925 weighted F1-score, linking with 0.962 accuracy, and 0.914 weighted F1 when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining. Antibodies are widely used reagents to test for the expression of proteins. However, antibodies are also a known source of reproducibility problems in biomedicine, as specificity and other issues can complicate their use. Information about how antibodies perform for specific applications are scattered across the biomedical literature and multiple websites. To alert scientists with reported antibody issues, we develop text mining algorithms that can identify specificity issues reported in the literature. We developed a deep neural network algorithm and performed a feasibility study on 2,223 papers. We leveraged Research Resource Identifiers (RRIDs), unique identifiers for antibodies and other biomedical resources, to match extracted specificity issues with particular antibodies. The results show that our system, called “Antibody Watch,” can accurately perform specificity issue identification and RRID association with a weighted F-score over 0.914. From our test corpus, we identified 37 antibodies with 68 nonspecific issue statements. With Antibody Watch, for example, if one were looking for an antibody targeting beta-Amyloid 1–16, from 74 antibodies at dkNET Resource Reports (on 10/2/20), one would be alerted that “some non-specific bands were detected at 55 kDa in both WT and APP/PS1 mice with the 6E10 antibody…”
Collapse
|
12
|
Improving Scientific Rigor and Reproducibility: Check Research Resources Information When Planning Experiments Using dkNET. FASEB J 2020. [DOI: 10.1096/fasebj.2020.34.s1.04748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
FAIR SCI Ahead: The Evolution of the Open Data Commons for Pre-Clinical Spinal Cord Injury Research. J Neurotrauma 2020; 37:831-838. [PMID: 31608767 PMCID: PMC7071068 DOI: 10.1089/neu.2019.6674] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Over the last 5 years, multiple stakeholders in the field of spinal cord injury (SCI) research have initiated efforts to promote publications standards and enable sharing of experimental data. In 2016, the National Institutes of Health/National Institute of Neurological Disorders and Stroke hosted representatives from the SCI community to streamline these efforts and discuss the future of data sharing in the field according to the FAIR (Findable, Accessible, Interoperable and Reusable) data stewardship principles. As a next step, a multi-stakeholder group hosted a 2017 symposium in Washington, DC entitled "FAIR SCI Ahead: the Evolution of the Open Data Commons for Spinal Cord Injury research." The goal of this meeting was to receive feedback from the community regarding infrastructure, policies, and organization of a community-governed Open Data Commons (ODC) for pre-clinical SCI research. Here, we summarize the policy outcomes of this meeting and report on progress implementing these policies in the form of a digital ecosystem: the Open Data Commons for Spinal Cord Injury (ODC-SCI.org). ODC-SCI enables data management, harmonization, and controlled sharing of data in a manner consistent with the well-established norms of scholarly publication. Specifically, ODC-SCI is organized around virtual "laboratories" with the ability to share data within each of three distinct data-sharing spaces: within the laboratory, across verified laboratories, or publicly under a creative commons license (CC-BY 4.0) with a digital object identifier that enables data citation. The ODC-SCI implements FAIR data sharing and enables pooled data-driven discovery while crediting the generators of valuable SCI data.
Collapse
|
14
|
Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts. Comput Sci Eng 2020. [DOI: 10.1109/mcse.2019.2952838] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
15
|
Bio-AnswerFinder: a system to find answers to questions from biomedical texts. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5700339. [PMID: 31925435 PMCID: PMC7053013 DOI: 10.1093/database/baz137] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/04/2019] [Accepted: 11/07/2019] [Indexed: 11/15/2022]
Abstract
The ever accelerating pace of biomedical research results in corresponding acceleration in the volume of biomedical literature created. Since new research builds upon existing knowledge, the rate of increase in the available knowledge encoded in biomedical literature makes the easy access to that implicit knowledge more vital over time. Toward the goal of making implicit knowledge in the biomedical literature easily accessible to biomedical researchers, we introduce a question answering system called Bio-AnswerFinder. Bio-AnswerFinder uses a weighted-relaxed word mover's distance based similarity on word/phrase embeddings learned from PubMed abstracts to rank answers after question focus entity type filtering. Our approach retrieves relevant documents iteratively via enhanced keyword queries from a traditional search engine. To improve document retrieval performance, we introduced a supervised long short term memory neural network to select keywords from the question to facilitate iterative keyword search. Our unsupervised baseline system achieves a mean reciprocal rank score of 0.46 and Precision@1 of 0.32 on 936 questions from BioASQ. The answer sentences are further ranked by a fine-tuned bidirectional encoder representation from transformers (BERT) classifier trained using 100 answer candidate sentences per question for 492 BioASQ questions. To test ranking performance, we report a blind test on 100 questions that three independent annotators scored. These experts preferred BERT based reranking with 7% improvement on MRR and 13% improvement on Precision@1 scores on average.
Collapse
|
16
|
dkNET (NIDDK Information Network): Research tools that assist scientists in improving the rigor and reproducibility of their research. FASEB J 2019. [DOI: 10.1096/fasebj.2019.33.1_supplement.802.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
17
|
Everything Matters: The ReproNim Perspective on Reproducible Neuroimaging. Front Neuroinform 2019; 13:1. [PMID: 30792636 PMCID: PMC6374302 DOI: 10.3389/fninf.2019.00001] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 01/17/2019] [Indexed: 11/30/2022] Open
Abstract
There has been a recent major upsurge in the concerns about reproducibility in many areas of science. Within the neuroimaging domain, one approach is to promote reproducibility is to target the re-executability of the publication. The information supporting such re-executability can enable the detailed examination of how an initial finding generalizes across changes in the processing approach, and sampled population, in a controlled scientific fashion. ReproNim: A Center for Reproducible Neuroimaging Computation is a recently funded initiative that seeks to facilitate the “last mile” implementations of core re-executability tools in order to reduce the accessibility barrier and increase adoption of standards and best practices at the neuroimaging research laboratory level. In this report, we summarize the overall approach and tools we have developed in this domain.
Collapse
|
18
|
Uniform resolution of compact identifiers for biomedical data. Sci Data 2018; 5:180029. [PMID: 29737976 PMCID: PMC5944906 DOI: 10.1038/sdata.2018.29] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 01/26/2018] [Indexed: 11/09/2022] Open
Abstract
Most biomedical data repositories issue locally-unique accessions numbers, but do not provide globally unique, machine-resolvable, persistent identifiers for their datasets, as required by publishers wishing to implement data citation in accordance with widely accepted principles. Local accessions may however be prefixed with a namespace identifier, providing global uniqueness. Such "compact identifiers" have been widely used in biomedical informatics to support global resource identification with local identifier assignment. We report here on our project to provide robust support for machine-resolvable, persistent compact identifiers in biomedical data citation, by harmonizing the Identifiers.org and N2T.net (Name-To-Thing) meta-resolvers and extending their capabilities. Identifiers.org services hosted at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), and N2T.net services hosted at the California Digital Library (CDL), can now resolve any given identifier from over 600 source databases to its original source on the Web, using a common registry of prefix-based redirection rules. We believe these services will be of significant help to publishers and others implementing persistent, machine-resolvable citation of research data.
Collapse
|
19
|
DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open
Abstract
Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
Collapse
|
20
|
Foundry: a message-oriented, horizontally scalable ETL system for scientific data integration and enhancement. Database (Oxford) 2018; 2018:5255189. [PMID: 30576493 PMCID: PMC6301337 DOI: 10.1093/database/bay130] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/18/2018] [Accepted: 11/14/2018] [Indexed: 11/12/2022]
Abstract
Data generated by scientific research enables further advancement in science through reanalyses and pooling of data for novel analyses. With the increasing amounts of scientific data generated by biomedical research providing researchers with more data than they have ever had access to, finding the data matching the researchers' requirements continues to be a major challenge and will only grow more challenging as more data is produced and shared. In this paper, we introduce a horizontally scalable distributed extract-transform-load system to tackle scientific data aggregation, transformation and enhancement for scientific data discovery and retrieval. We also introduce a data transformation language for biomedical curators allowing for the transformation and combination of data/metadata from heterogeneous data sources. Applicability of the system for scientific data is illustrated in biomedical and earth science domains.
Collapse
|
21
|
|
22
|
DATS, the data tag suite to enable discoverability of datasets. Sci Data 2017; 4:170059. [PMID: 28585923 PMCID: PMC5460592 DOI: 10.1038/sdata.2017.59] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 03/30/2017] [Indexed: 11/21/2022] Open
Abstract
Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
Collapse
|
23
|
Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature. PLoS One 2016; 11:e0146300. [PMID: 26730820 PMCID: PMC5156472 DOI: 10.1371/journal.pone.0146300] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 12/15/2015] [Indexed: 11/19/2022] Open
Abstract
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.
Collapse
|
24
|
The NIDDK Information Network: A Community Portal for Finding Data, Materials, and Tools for Researchers Studying Diabetes, Digestive, and Kidney Diseases. PLoS One 2015; 10:e0136206. [PMID: 26393351 PMCID: PMC4578941 DOI: 10.1371/journal.pone.0136206] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 07/30/2015] [Indexed: 11/19/2022] Open
Abstract
The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the "hidden" web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a "search engine for data", searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources.
Collapse
|
25
|
Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources. Front Neuroinform 2014; 8:58. [PMID: 25018728 PMCID: PMC4071641 DOI: 10.3389/fninf.2014.00058] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/06/2014] [Indexed: 11/15/2022] Open
Abstract
This paper describes how DISCO, the data aggregator that supports the Neuroscience Information Framework (NIF), has been extended to play a central role in automating the complex workflow required to support and coordinate the NIF’s data integration capabilities. The NIF is an NIH Neuroscience Blueprint initiative designed to help researchers access the wealth of data related to the neurosciences available via the Internet. A central component is the NIF Federation, a searchable database that currently contains data from 231 data and information resources regularly harvested, updated, and warehoused in the DISCO system. In the past several years, DISCO has greatly extended its functionality and has evolved to play a central role in automating the complex, ongoing process of harvesting, validating, integrating, and displaying neuroscience data from a growing set of participating resources. This paper provides an overview of DISCO’s current capabilities and discusses a number of the challenges and future directions related to the process of coordinating the integration of neuroscience data within the NIF Federation.
Collapse
|
26
|
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas005. [PMID: 22434839 PMCID: PMC3308161 DOI: 10.1093/database/bas005] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL:http://neuinfo.org
Collapse
|
27
|
Abstract
We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It provides integration with external projects such as the Whole Brain Catalog (WBC) and Neuroscience Information Framework (NIF), which benefit from NCMIR data. We also report on specific workflows which spawn from main workflows and perform data fusion and orchestration of Web services specific for the NIF project. This “Brain data flow” presents a user with categorized information about sources that have information on various brain regions.
Collapse
|
28
|
Abstract
The XCEDE (XML-based Clinical and Experimental Data Exchange) XML schema, developed by members of the BIRN (Biomedical Informatics Research Network), provides an extensive metadata hierarchy for storing, describing and documenting the data generated by scientific studies. Currently at version 2.0, the XCEDE schema serves as a specification for the exchange of scientific data between databases, analysis tools, and web services. It provides a structured metadata hierarchy, storing information relevant to various aspects of an experiment (project, subject, protocol, etc.). Each hierarchy level also provides for the storage of data provenance information allowing for a traceable record of processing and/or changes to the underlying data. The schema is extensible to support the needs of various data modalities and to express types of data not originally envisioned by the developers. The latest version of the XCEDE schema and manual are available from http://www.xcede.org/ .
Collapse
|
29
|
Abstract
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.
Collapse
|
30
|
Derived Data Storage and Exchange Workflow for Large-Scale Neuroimaging Analyses on the BIRN Grid. Front Neuroinform 2009; 3:30. [PMID: 19826494 PMCID: PMC2759340 DOI: 10.3389/neuro.11.030.2009] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2009] [Accepted: 08/16/2009] [Indexed: 11/13/2022] Open
Abstract
Organizing and annotating biomedical data in structured ways has gained much interest and focus in the last 30 years. Driven by decreases in digital storage costs and advances in genetics sequencing, imaging, electronic data collection, and microarray technologies, data is being collected at an ever increasing rate. The need to store and exchange data in meaningful ways in support of data analysis, hypothesis testing and future collaborative use is pervasive. Because trans-disciplinary projects rely on effective use of data from many domains, there is a genuine interest in informatics community on how best to store and combine this data while maintaining a high level of data quality and documentation. The difficulties in sharing and combining raw data become amplified after post-processing and/or data analysis in which the new dataset of interest is a function of the original data and may have been collected by multiple collaborating sites. Simple meta-data, documenting which subject and version of data were used for a particular analysis, becomes complicated by the heterogeneity of the collecting sites yet is critically important to the interpretation and reuse of derived results. This manuscript will present a case study of using the XML-Based Clinical Experiment Data Exchange (XCEDE) schema and the Human Imaging Database (HID) in the Biomedical Informatics Research Network's (BIRN) distributed environment to document and exchange derived data. The discussion includes an overview of the data structures used in both the XML and the database representations, insight into the design considerations, and the extensibility of the design to support additional analysis streams.
Collapse
|
31
|
The Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC). Neuroimage 2009. [DOI: 10.1016/s1053-8119(09)70519-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
32
|
Mediator infrastructure for information integration and semantic data integration environment for biomedical research. Methods Mol Biol 2009; 569:33-53. [PMID: 19623485 DOI: 10.1007/978-1-59745-524-4_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the development of a cyberinfrastructure for biomedical research that supports advance data acquisition, data storage, data management, data integration, data mining, data visualization, and other computing and information processing services over the Internet. Each participating institution maintains storage of their experimental or computationally derived data. Mediator-based data integration system performs semantic integration over the databases to enable researchers to perform analyses based on larger and broader datasets than would be available from any single institution's data. This paper describes recent revision of the system architecture, implementation, and capabilities of the semantically based data integration environment for BIRN.
Collapse
|
33
|
Data federation in the Biomedical Informatics Research Network: tools for semantic annotation and query of distributed multiscale brain data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008:1220. [PMID: 18999211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 06/17/2008] [Indexed: 05/27/2023]
Abstract
The broadly defined mission of the Biomedical Informatics Research Network (BIRN, www.nbirn.net) is to better understand the causes human disease and the specific ways in which animal models inform that understanding. To construct the community-wide infrastructure for gathering, organizing and managing this knowledge, BIRN is developing a federated architecture for linking multiple databases across sites contributing data and knowledge. Navigating across these distributed data sources requires a shared semantic scheme and supporting software framework to actively link the disparate repositories. At the core of this knowledge organization is BIRNLex, a formally-represented ontology facilitating data exchange. Source curators enable database interoperability by mapping their schema and data to BIRNLex semantic classes thereby providing a means to cast BIRNLex-based queries against specific data sources in the federation. We will illustrate use of the source registration, term mapping, and query tools.
Collapse
|
34
|
Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF). Neuroinformatics 2008; 6:205-17. [PMID: 18958629 DOI: 10.1007/s12021-008-9033-y] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2008] [Accepted: 09/08/2008] [Indexed: 10/21/2022]
Abstract
The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov.
Collapse
|
35
|
A national human neuroimaging collaboratory enabled by the Biomedical Informatics Research Network (BIRN). ACTA ACUST UNITED AC 2008; 12:162-72. [PMID: 18348946 DOI: 10.1109/titb.2008.917893] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The aggregation of imaging, clinical, and behavioral data from multiple independent institutions and researchers presents both a great opportunity for biomedical research as well as a formidable challenge. Many research groups have well-established data collection and analysis procedures, as well as data and metadata format requirements that are particular to that group. Moreover, the types of data and metadata collected are quite diverse, including image, physiological, and behavioral data, as well as descriptions of experimental design, and preprocessing and analysis methods. Each of these types of data utilizes a variety of software tools for collection, storage, and processing. Furthermore sites are reluctant to release control over the distribution and access to the data and the tools. To address these needs, the Biomedical Informatics Research Network (BIRN) has developed a federated and distributed infrastructure for the storage, retrieval, analysis, and documentation of biomedical imaging data. The infrastructure consists of distributed data collections hosted on dedicated storage and computational resources located at each participating site, a federated data management system and data integration environment, an Extensible Markup Language (XML) schema for data exchange, and analysis pipelines, designed to leverage both the distributed data management environment and the available grid computing resources.
Collapse
|
36
|
A web portal that enables collaborative use of advanced medical image processing and informatics tools through the Biomedical Informatics Research Network (BIRN). AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2006; 2006:579-83. [PMID: 17238407 PMCID: PMC1839506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Launched in 2001, the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) is an NIH - NCRR initiative that enables researchers to collaborate in an environment for biomedical research and clinical information management, focused particularly upon medical imaging. Although it supports a vast array of programs to transform and calculate upon medical images, three fundamental problems emerged that inhibited collaborations. The first was that the complexity of the programs, and at times legal restrictions, combined to prohibit these programs from being accessible to all members of the teams and indeed the general researcher, although this was a fundamental mission of the BIRN. Second, the calculations that needed to be performed were very complex, and required many steps that often needed to be performed by different groups. Third, many of the analysis programs were not interoperable. These problems combined to created tremendous logistical problems. The solution was to create a portal-based workflow application that allowed the complex, collaborative tasks to take place and enabled new kinds of calculations that had not previously been practical.
Collapse
|
37
|
A General XML Schema and SPM Toolbox for Storage of Neuro-Imaging Results and Anatomical Labels. Neuroinformatics 2006; 4:199-212. [PMID: 16845169 DOI: 10.1385/ni:4:2:199] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/1999] [Revised: 11/30/1999] [Accepted: 11/30/1999] [Indexed: 11/11/2022]
Abstract
With the increased frequency of multisite, large-scale collaborative neuro-imaging studies, the need for a general, self-documenting framework for the storage and retrieval of activation maps and anatomical labels becomes evident. To address this need, we have developed and extensible markup language (XML) schema and associated tools for the storage of neuro-imaging activation maps and anatomical labels. This schema, as part of the XML-based Clinical Experiment Data Exchange (XCEDE) schema, provides storage capabilities for analysis annotations, activation threshold parameters, and cluster and voxel-level statistics. Activation parameters contain information describing the threshold, degrees of freedom, FWHM smoothness, search volumes, voxel sizes, expected voxels per cluster, and expected number of clusters in the statistical map. Cluster and voxel statistics can be stored along with the coordinates, threshold, and anatomical label information. Multiple threshold types can be documented for a given cluster or voxel along with the uncorrected and corrected probability values. Multiple atlases can be used to generate anatomical labels and stored for each significant voxel or cluter. Additionally, a toolbox for Statistical Parametric Mapping software (http://www. fil. ion.ucl.ac.uk/spm/) was created to capture the results from activation maps using the XML schema that supports both SPM99 and SPM2 versions (http://nbirn.net/Resources/Users/ Applications/xcede/SPM_XMLTools.htm). Support for anatomical labeling is available via the Talairach Daemon (http://ric.uthscsa. edu/projects/talairachdaemon.html) and Automated Anatomical Labeling (http://www. cyceron.fr/freeware/).
Collapse
|
38
|
Biomedical informatics research network: building a national collaboratory to hasten the derivation of new understanding and treatment of disease. Stud Health Technol Inform 2005; 112:100-9. [PMID: 15923720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Through support from the National Institutes of Health's National Center for Research Resources, the Biomedical Informatics Research Network (BIRN) is pioneering the use of advanced cyberinfrastructure for medical research. By synchronizing developments in advanced wide area networking, distributed computing, distributed database federation, and other emerging capabilities of e-science, the BIRN has created a collaborative environment that is paving the way for biomedical research and clinical information management. The BIRN Coordinating Center (BIRN-CC) is orchestrating the development and deployment of key infrastructure components for immediate and long-range support of biomedical and clinical research being pursued by domain scientists in three neuroimaging test beds.
Collapse
|
39
|
Abstract
We present issues arising when trying to formalize disease maps, i.e. ontologies to represent the terminological relationships among concepts necessary to construct a knowledge-base of neurological disorders. These disease maps are being created in the context of a large-scale data mediation system being created for the Biomedical Informatics Research Network (BIRN). The BIRN is a multi-university consortium collaborating to establish a large-scale data and computational grid around neuroimaging data, collected across multiple scales. Test bed projects within BIRN involve both animal and human studies of Alzheimer's disease, Parkinson's disease and schizophrenia. Incorporating both the static 'terminological' relationships and dynamic processes, disease maps are being created to encapsulate a comprehensive theory of a disease. Terms within the disease map can also be connected to the relevant terms within other ontologies (e.g. the Unified Medical Language System), in order to allow the disease map management system to derive relationships between a larger set of terms than what is contained within the disease map itself. In this paper, we use the basic structure of a disease map we are developing for Parkinson's disease to illustrate our initial formalization for disease maps.
Collapse
|
40
|
The Functional Magnetic Resonance Imaging Data Center (fMRIDC): the challenges and rewards of large-scale databasing of neuroimaging studies. Philos Trans R Soc Lond B Biol Sci 2001; 356:1323-39. [PMID: 11545705 PMCID: PMC1088517 DOI: 10.1098/rstb.2001.0916] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Functional Magnetic Resonance Imaging Data Center (fMRIDC) (http://www.fmridc.org) was established in the Autumn of 1999 with the objective of creating a mechanism by which members of the neuroscientific community may more easily share functional neuroimaging data. Examples in other sciences offer proof of the usefulness and benefit that sharing data provides through encouraging growth and development in those fields. By building a publicly accessible repository of raw data from peer-reviewed studies, the Data Center hopes to create a similarly successful environment for the neurosciences. In this article, we discuss the continuum of data-sharing efforts and provide an overview of the scientific and practical difficulties inherent in managing various fMRI data-sharing approaches. Next, we detail the organization, design and foundation of the fMRIDC, ranging from its current capabilities to the issues involved in the submitting and requesting of data. We discuss how a publicly accessible database enables other fields to develop relevant tools that can aid in the growth of understanding of cognitive processes. Information retrieval and meta-analytic techniques can be used to search, sort and categorize study information with a view towards subjecting study data to secondary 'meta-' and 'mega-analyses'. In addition, we detail the technical and policy challenges that have had to be addressed in the formation of the Data Center. Among others, these include: human subject confidentiality issues; ensuring investigator's rights; heterogeneous data description and organization; development of search tools; and data transfer issues. We conclude with comments concerning the future of the fMRIDC effort, its role in promoting the sharing of neuroscientific data, and how this may alter the manner in which studies are published.
Collapse
|
41
|
|
42
|
Functional anatomy of nonvisual feedback loops during reaching: a positron emission tomography study. J Neurosci 2001; 21:2919-28. [PMID: 11306644 PMCID: PMC6762522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Reaching movements performed without vision of the moving limb are continuously monitored, during their execution, by feedback loops (designated nonvisual). In this study, we investigated the functional anatomy of these nonvisual loops using positron emission tomography (PET). Seven subjects had to "look at" (eye) or "look and point to" (eye-arm) visual targets whose location either remained stationary or changed undetectably during the ocular saccade (when vision is suppressed). Slightly changing the target location during gaze shift causes an increase in the amount of correction to be generated. Functional anatomy of nonvisual feedback loops was identified by comparing the reaching condition involving large corrections (jump) with the reaching condition involving small corrections (stationary), after subtracting the activations associated with saccadic movements and hand movement planning [(eye-arm-jumping minus eye-jumping) minus (eye-arm-stationary minus eye-stationary)]. Behavioral data confirmed that the subjects were both accurate at reaching to the stationary targets and able to update their movement smoothly and early in response to the target jump. PET difference images showed that these corrections were mediated by a restricted network involving the left posterior parietal cortex, the right anterior intermediate cerebellum, and the left primary motor cortex. These results are consistent with our knowledge of the functional properties of these areas and more generally with models emphasizing parietal-cerebellar circuits for processing a dynamic motor error signal.
Collapse
|
43
|
Abstract
It is known that the saccadic system shows adaptive changes when the command sent to the extraocular muscles is inappropriate. Despite an abundance of supportive psychophysical investigations, the neurophysiological substrate of this process is still debated. The present study addresses this issue using H2(15)O positron emission tomography (PET). We contrasted three conditions in which healthy human subjects were required to perform saccadic eye movements toward peripheral visual targets. Two conditions involved a modification of the target location during the course of the initial saccade, when there is suppression of visual perception. In the RAND condition, intra-saccadic target displacement was random from trial-to-trial, precluding any systematic modification of the primary saccade amplitude. In the ADAPT condition, intra-saccadic target displacement was uniform, causing adaptive modification of the primary saccade amplitude. In the third condition (stationary, STAT), the target remained at the same location during the entire trial. Difference images reflecting regional cerebral-blood-flow changes attributable to the process of saccadic adaptation (ADAPT minus RAND; ADAPT minus STAT) showed a selective activation in the oculomotor cerebellar vermis (OCV; lobules VI and VII). This finding is consistent with neurophysiological studies in monkeys. Additional analyses indicated that the cerebellar activation was not related to kinematic factors, and that the absence of significant activation within the frontal eye fields (FEF) or the superior colliculus (SC) did not represent a false negative inference. Besides the contribution of the OCV to saccadic adaptation, we also observed, in the RAND condition, that the saccade amplitude was significantly larger when the previous trial involved a forward jump than when the previous trial involved a backward jump. This observation indicates that saccade accuracy is constantly monitored on a trial-to-trial basis. Behavioral measurements and PET observations (RAND minus STAT) suggest that this single-trial control of saccade amplitude may be functionally distinct from the process of saccadic adaptation.
Collapse
|
44
|
Essential neuronal pathways for reflex and conditioned response initiation in an intracerebellar stimulation paradigm and the impact of unconditioned stimulus preexposure on learning rate. Neurobiol Learn Mem 1999; 71:167-93. [PMID: 10082638 DOI: 10.1006/nlme.1998.3872] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
It has been demonstrated previously that pairing of tone CS and intracerebellar stimulation of lobule HVI white matter as the US produces conditioning that is robust and in many ways similar to that obtained with an airpuff US. The first study in this report addressed the effect of interpositus lesions on conditioned performance in rabbits trained with white matter stimulation as the US. It was found that interpositus lesions effectively eliminated the CR irrespective of the behavioral response measured. In addition, it was shown that the interpositus lesions also abolished the UR, providing strong evidence that the effects of the electrical stimulation were confined to the cerebellum and did not require the activation of brainstem structures. The second experiment examined performance on US-alone trials of varying durations. Response initiation within 100 ms of the US onset, regardless of US duration, indicated that reflex generation could not be due to rebound excitation of the interpositus following termination of Purkinje cell inhibition of that structure but instead likely reflects orthodromic activation of interpositus neurons via climbing fiber and/or mossy fiber collaterals. The impact of US preexposure on associative conditioning in this paradigm was also determined. Animals which received only 108 US-alone trials were massively impaired during subsequent training compared to rabbits that received fewer than 12 US-alone trials.
Collapse
|
45
|
Abstract
This chapter reviews evidence demonstrating the essential role of the cerebellum and its associated circuitry in the learning and memory of classical conditioning of discrete behavioral responses (e.g., eyeblink, limb flexion, head turn). It now seems conclusive that the memory traces for this basic category of associative learning are formed and stored in the cerebellum. Lesion, neuronal recording, electrical microstimulation, and anatomical procedures have been used to identify the essential conditioned stimulus (CS) circuit, including the pontine mossy fiber projections to the cerebellum; the essential unconditioned stimulus (US) reinforcing or teaching circuit, including neurons in the inferior olive (dorsal accessory olive) projecting to the cerebellum as climbing fibers; and the essential conditioned response (CR) circuit, including the interpositus nucleus, its projection via the superior cerebellar peduncle to the magnocellular red nucleus, and rubral projections to premotor and motor nuclei. Each major component of the eyeblink CR circuit was reversibly inactivated both in trained animals and over the course of training. In all cases in trained animals, inactivation abolished the CR (and the UR as well when motor nuclei were inactivated). When animals were trained during inactivation (and not exhibiting CRs) and then tested without inactivation, animals with inactivation of the motor nuclei, red nucleus, and superior peduncle had fully learned, whereas animals with inactivation of a very localized region of the cerebellum (anterior interpositus and overlying cortex) had not learned at all. Consequently, the memory traces are formed and stored in the cerebellum. Several alternative possibilities are considered and ruled out. Both the cerebellar cortex and the interpositus nucleus are involved in the memory storage process, suggesting that a phenomenon-like long-term depression (LTD) is involved in the cerebellar cortex and long-term potentiation (LTP) is involved in the interpositus. The experimental findings reviewed in this chapter provide perhaps the first conclusive evidence for the localization of a basic form of memory storage to a particular brain region, namely the cerebellum, and indicate that the cerebellum is indeed a cognitive machine.
Collapse
|