1
|
Bayesian Optimization of Neurostimulation (BOONStim). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.584169. [PMID: 38559269 PMCID: PMC10979934 DOI: 10.1101/2024.03.08.584169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
BACKGROUND Transcranial magnetic stimulation (TMS) treatment response is influenced by individual variability in brain structure and function. Sophisticated, user-friendly approaches, incorporating both established functional magnetic resonance imaging (fMRI) and TMS simulation tools, to identify TMS targets are needed. OBJECTIVE The current study presents the development and validation of the Bayesian Optimization of Neuro-Stimulation (BOONStim) pipeline. METHODS BOONStim uses Bayesian optimization for individualized TMS targeting, automating interoperability between surface-based fMRI analytic tools and TMS electric field modeling. Bayesian optimization performance was evaluated in a sample dataset (N=10) using standard circular and functional connectivity-defined targets, and compared to grid optimization. RESULTS Bayesian optimization converged to similar levels of total electric field stimulation across targets in under 30 iterations, converging within a 5% error of the maxima detected by grid optimization, and requiring less time. CONCLUSIONS BOONStim is a scalable and configurable user-friendly pipeline for individualized TMS targeting with quick turnaround.
Collapse
|
2
|
Web-based processing of physiological noise in fMRI: addition of the PhysIO toolbox to CBRAIN. Front Neuroinform 2023; 17:1251023. [PMID: 37841811 PMCID: PMC10569687 DOI: 10.3389/fninf.2023.1251023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/05/2023] [Indexed: 10/17/2023] Open
Abstract
Neuroimaging research requires sophisticated tools for analyzing complex data, but efficiently leveraging these tools can be a major challenge, especially on large datasets. CBRAIN is a web-based platform designed to simplify the use and accessibility of neuroimaging research tools for large-scale, collaborative studies. In this paper, we describe how CBRAIN's unique features and infrastructure were leveraged to integrate TAPAS PhysIO, an open-source MATLAB toolbox for physiological noise modeling in fMRI data. This case study highlights three key elements of CBRAIN's infrastructure that enable streamlined, multimodal tool integration: a user-friendly GUI, a Brain Imaging Data Structure (BIDS) data-entry schema, and convenient in-browser visualization of results. By incorporating PhysIO into CBRAIN, we achieved significant improvements in the speed, ease of use, and scalability of physiological preprocessing. Researchers now have access to a uniform and intuitive interface for analyzing data, which facilitates remote and collaborative evaluation of results. With these improvements, CBRAIN aims to become an essential open-science tool for integrative neuroimaging research, supporting FAIR principles and enabling efficient workflows for complex analysis pipelines.
Collapse
|
3
|
Data and Tools Integration in the Canadian Open Neuroscience Platform. Sci Data 2023; 10:189. [PMID: 37024500 PMCID: PMC10079825 DOI: 10.1038/s41597-023-01946-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 01/10/2023] [Indexed: 04/08/2023] Open
Abstract
We present the Canadian Open Neuroscience Platform (CONP) portal to answer the research community's need for flexible data sharing resources and provide advanced tools for search and processing infrastructure capacity. This portal differs from previous data sharing projects as it integrates datasets originating from a number of already existing platforms or databases through DataLad, a file level data integrity and access layer. The portal is also an entry point for searching and accessing a large number of standardized and containerized software and links to a computing infrastructure. It leverages community standards to help document and facilitate reuse of both datasets and tools, and already shows a growing community adoption giving access to more than 60 neuroscience datasets and over 70 tools. The CONP portal demonstrates the feasibility and offers a model of a distributed data and tool management system across 17 institutions throughout Canada.
Collapse
|
4
|
Neurodesk: An accessible, flexible, and portable data analysis environment for reproducible neuroimaging. RESEARCH SQUARE 2023:rs.3.rs-2649734. [PMID: 36993557 PMCID: PMC10055538 DOI: 10.21203/rs.3.rs-2649734/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Neuroimaging data analysis often requires purpose-built software, which can be challenging to install and may produce different results across computing environments. Beyond being a roadblock to neuroscientists, these issues of accessibility and portability can hamper the reproducibility of neuroimaging data analysis pipelines. Here, we introduce the Neurodesk platform, which harnesses software containers to support a comprehensive and growing suite of neuroimaging software (https://www.neurodesk.org/). Neurodesk includes a browser-accessible virtual desktop environment and a command line interface, mediating access to containerized neuroimaging software libraries on various computing platforms, including personal and high-performance computers, cloud computing and Jupyter Notebooks. This community-oriented, open-source platform enables a paradigm shift for neuroimaging data analysis, allowing for accessible, flexible, fully reproducible, and portable data analysis pipelines.
Collapse
|
5
|
Italian, European, and international neuroinformatics efforts: An overview. Eur J Neurosci 2022. [PMID: 36310103 DOI: 10.1111/ejn.15854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 10/18/2022] [Accepted: 10/18/2022] [Indexed: 12/15/2022]
Abstract
Neuroinformatics is a research field that focusses on software tools capable of identifying, analysing, modelling, organising and sharing multiscale neuroscience data. Neuroinformatics has exploded in the last two decades with the emergence of the Big Data phenomenon, characterised by the so-called 3Vs (volume, velocity and variety), which provided neuroscientists with an improved ability to acquire and process data faster and more cheaply thanks to technical improvements in clinical, genomic and radiological technologies. This situation has led to a 'data deluge', as neuroscientists can routinely collect more study data in a few days than they could in a year just a decade ago. To address this phenomenon, several neuroimaging-focussed neuroinformatics platforms have emerged, funded by national or transnational agencies, with the following goals: (i) development of tools for archiving and organising analytical data (XNAT, REDCap and LabKey); (ii) development of data-driven models evolving from reductionist approaches to multidimensional models (RIN, IVN, HBD, EuroPOND, E-DADS and GAAIN BRAIN); and (iii) development of e-infrastructures to provide sufficient computational power and storage resources (neuGRID, HBP-EBRAINS, LONI and CONP). Although the scenario is still fragmented, there are technological and economical attempts at both national and international levels to introduce high standards for open and Findable, Accessible, Interoperable and Reusable (FAIR) neuroscience worldwide.
Collapse
|
6
|
Abstract
In this perspective article, we consider the critical issue of data and other research object standardisation and, specifically, how international collaboration, and organizations such as the International Neuroinformatics Coordinating Facility (INCF) can encourage that emerging neuroscience data be Findable, Accessible, Interoperable, and Reusable (FAIR). As neuroscientists engaged in the sharing and integration of multi-modal and multiscale data, we see the current insufficiency of standards as a major impediment in the Interoperability and Reusability of research results. We call for increased international collaborative standardisation of neuroscience data to foster integration and efficient reuse of research objects.
Collapse
|
7
|
FAIRly big: A framework for computationally reproducible processing of large-scale data. Sci Data 2022; 9:80. [PMID: 35277501 PMCID: PMC8917149 DOI: 10.1038/s41597-022-01163-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 02/11/2022] [Indexed: 11/30/2022] Open
Abstract
Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).
Collapse
|
8
|
Numerical uncertainty in analytical pipelines lead to impactful variability in brain networks. PLoS One 2021; 16:e0250755. [PMID: 34724000 PMCID: PMC8559953 DOI: 10.1371/journal.pone.0250755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/25/2021] [Indexed: 11/19/2022] Open
Abstract
The analysis of brain-imaging data requires complex processing pipelines to support findings on brain function or pathologies. Recent work has shown that variability in analytical decisions, small amounts of noise, or computational environments can lead to substantial differences in the results, endangering the trust in conclusions. We explored the instability of results by instrumenting a structural connectome estimation pipeline with Monte Carlo Arithmetic to introduce random noise throughout. We evaluated the reliability of the connectomes, the robustness of their features, and the eventual impact on analysis. The stability of results was found to range from perfectly stable (i.e. all digits of data significant) to highly unstable (i.e. 0 - 1 significant digits). This paper highlights the potential of leveraging induced variance in estimates of brain connectivity to reduce the bias in networks without compromising reliability, alongside increasing the robustness and potential upper-bound of their applications in the classification of individual differences. We demonstrate that stability evaluations are necessary for understanding error inherent to brain imaging experiments, and how numerical analysis can be applied to typical analytical workflows both in brain imaging and other domains of computational sciences, as the techniques used were data and context agnostic and globally relevant. Overall, while the extreme variability in results due to analytical instabilities could severely hamper our understanding of brain organization, it also affords us the opportunity to increase the robustness of findings.
Collapse
|
9
|
An analysis of security vulnerabilities in container images for scientific data analysis. Gigascience 2021; 10:6291571. [PMID: 34080631 PMCID: PMC8173661 DOI: 10.1093/gigascience/giab025] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 03/01/2021] [Indexed: 11/28/2022] Open
Abstract
Background Software containers greatly facilitate the deployment and reproducibility of scientific data analyses in various platforms. However, container images often contain outdated or unnecessary software packages, which increases the number of security vulnerabilities in the images, widens the attack surface in the container host, and creates substantial security risks for computing infrastructures at large. This article presents a vulnerability analysis of container images for scientific data analysis. We compare results obtained with 4 vulnerability scanners, focusing on the use case of neuroscience data analysis, and quantifying the effect of image update and minification on the number of vulnerabilities. Results We find that container images used for neuroscience data analysis contain hundreds of vulnerabilities, that software updates remove roughly two-thirds of these vulnerabilities, and that removing unused packages is also effective. Conclusions We provide recommendations on how to build container images with fewer vulnerabilities.
Collapse
|
10
|
Management and Quality Control of Large Neuroimaging Datasets: Developments From the Barcelonaβeta Brain Research Center. Front Neurosci 2021; 15:633438. [PMID: 33935631 PMCID: PMC8081968 DOI: 10.3389/fnins.2021.633438] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/02/2021] [Indexed: 12/02/2022] Open
Abstract
Recent decades have witnessed an increasing number of large to very large imaging studies, prominently in the field of neurodegenerative diseases. The datasets collected during these studies form essential resources for the research aiming at new biomarkers. Collecting, hosting, managing, processing, or reviewing those datasets is typically achieved through a local neuroinformatics infrastructure. In particular for organizations with their own imaging equipment, setting up such a system is still a hard task, and relying on cloud-based solutions, albeit promising, is not always possible. This paper proposes a practical model guided by core principles including user involvement, lightweight footprint, modularity, reusability, and facilitated data sharing. This model is based on the experience from an 8-year-old research center managing cohort research programs on Alzheimer’s disease. Such a model gave rise to an ecosystem of tools aiming at improved quality control through seamless automatic processes combined with a variety of code libraries, command line tools, graphical user interfaces, and instant messaging applets. The present ecosystem was shaped around XNAT and is composed of independently reusable modules that are freely available on GitLab/GitHub. This paradigm is scalable to the general community of researchers working with large neuroimaging datasets.
Collapse
|
11
|
Evaluating the Reliability of Human Brain White Matter Tractometry. APERTURE NEURO 2021; 1:10.52294/e6198273-b8e3-4b63-babb-6e6b0da10669. [PMID: 35079748 PMCID: PMC8785971 DOI: 10.52294/e6198273-b8e3-4b63-babb-6e6b0da10669] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
The validity of research results depends on the reliability of analysis methods. In recent years, there have been concerns about the validity of research that uses diffusion-weighted MRI (dMRI) to understand human brain white matter connections in vivo, in part based on the reliability of analysis methods used in this field. We defined and assessed three dimensions of reliability in dMRI-based tractometry, an analysis technique that assesses the physical properties of white matter pathways: (1) reproducibility, (2) test-retest reliability, and (3) robustness. To facilitate reproducibility, we provide software that automates tractometry (https://yeatmanlab.github.io/pyAFQ). In measurements from the Human Connectome Project, as well as clinical-grade measurements, we find that tractometry has high test-retest reliability that is comparable to most standardized clinical assessment tools. We find that tractometry is also robust: showing high reliability with different choices of analysis algorithms. Taken together, our results suggest that tractometry is a reliable approach to analysis of white matter connections. The overall approach taken here both demonstrates the specific trustworthiness of tractometry analysis and outlines what researchers can do to establish the reliability of computational analysis pipelines in neuroimaging.
Collapse
|
12
|
File-based localization of numerical perturbations in data analysis pipelines. Gigascience 2020; 9:giaa106. [PMID: 33269388 PMCID: PMC7710495 DOI: 10.1093/gigascience/giaa106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/01/2020] [Accepted: 10/01/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. METHOD We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. RESULTS By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.
Collapse
|
13
|
A Quantitative EEG Toolbox for the MNI Neuroinformatics Ecosystem: Normative SPM of EEG Source Spectra. Front Neuroinform 2020; 14:33. [PMID: 32848689 PMCID: PMC7427620 DOI: 10.3389/fninf.2020.00033] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 06/26/2020] [Indexed: 01/20/2023] Open
Abstract
The Tomographic Quantitative Electroencephalography (qEEGt) toolbox is integrated with the Montreal Neurological Institute (MNI) Neuroinformatics Ecosystem as a docker into the Canadian Brain Imaging Research Platform (CBRAIN). qEEGt produces age-corrected normative Statistical Parametric Maps of EEG log source spectra testing compliance to a normative database. This toolbox was developed at the Cuban Neuroscience Center as part of the first wave of the Cuban Human Brain Mapping Project (CHBMP) and has been validated and used in different health systems for several decades. Incorporation into the MNI ecosystem now provides CBRAIN registered users access to its full functionality and is accompanied by a public release of the source code on GitHub and Zenodo repositories. Among other features are the calculation of EEG scalp spectra, and the estimation of their source spectra using the Variable Resolution Electrical Tomography (VARETA) source imaging. Crucially, this is completed by the evaluation of z spectra by means of the built-in age regression equations obtained from the CHBMP database (ages 5-87) to provide normative Statistical Parametric Mapping of EEG log source spectra. Different scalp and source visualization tools are also provided for evaluation of individual subjects prior to further post-processing. Openly releasing this software in the CBRAIN platform will facilitate the use of standardized qEEGt methods in different research and clinical settings. An updated precis of the methods is provided in Appendix I as a reference for the toolbox. qEEGt/CBRAIN is the first installment of instruments developed by the neuroinformatic platform of the Cuba-Canada-China (CCC) project.
Collapse
|
14
|
BIAFLOWS: A Collaborative Framework to Reproducibly Deploy and Benchmark Bioimage Analysis Workflows. PATTERNS (NEW YORK, N.Y.) 2020; 1:100040. [PMID: 33205108 PMCID: PMC7660398 DOI: 10.1016/j.patter.2020.100040] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/04/2020] [Accepted: 04/27/2020] [Indexed: 01/26/2023]
Abstract
Image analysis is key to extracting quantitative information from scientific microscopy images, but the methods involved are now often so refined that they can no longer be unambiguously described by written protocols. We introduce BIAFLOWS, an open-source web tool enabling to reproducibly deploy and benchmark bioimage analysis workflows coming from any software ecosystem. A curated instance of BIAFLOWS populated with 34 image analysis workflows and 15 microscopy image datasets recapitulating common bioimage analysis problems is available online. The workflows can be launched and assessed remotely by comparing their performance visually and according to standard benchmark metrics. We illustrated these features by comparing seven nuclei segmentation workflows, including deep-learning methods. BIAFLOWS enables to benchmark and share bioimage analysis workflows, hence safeguarding research results and promoting high-quality standards in image analysis. The platform is thoroughly documented and ready to gather annotated microscopy datasets and workflows contributed by the bioimaging community.
Collapse
|
15
|
Small Animal Shanoir (SAS) A Cloud-Based Solution for Managing Preclinical MR Brain Imaging Studies. Front Neuroinform 2020; 14:20. [PMID: 32508612 PMCID: PMC7248267 DOI: 10.3389/fninf.2020.00020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 04/16/2020] [Indexed: 01/28/2023] Open
Abstract
Clinical multicenter imaging studies are frequent and rely on a wide range of existing tools for sharing data and processing pipelines. This is not the case for preclinical (small animal) studies. Animal population imaging is still in infancy, especially because a complete standardization and control of initial conditions in animal models across labs is still difficult and few studies aim at standardization of acquisition and post-processing techniques. Clearly, there is a need of appropriate tools for the management and sharing of data, post-processing and analysis methods dedicated to small animal imaging. Solutions developed for Human imaging studies cannot be directly applied to this specific domain. In this paper, we present the Small Animal Shanoir (SAS) solution for supporting animal population imaging using tools compatible with open data. The integration of automated workflow tools ensures accessibility and reproducibility of research outputs. By sharing data and imaging processing tools, hosted by SAS, we promote data preparation and tools for reproducibility and reuse, and participation in multicenter or replication "open science" studies contributing to the improvement of quality science in preclinical domain. SAS is a first step for promoting open science for small animal imaging and a contribution to the valorization of data and pipelines of reference.
Collapse
|
16
|
Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 2019; 8:giz109. [PMID: 31544213 PMCID: PMC6755254 DOI: 10.1093/gigascience/giz109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/16/2019] [Accepted: 08/13/2019] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.
Collapse
|
17
|
SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. Gigascience 2019; 8:giz044. [PMID: 31029061 PMCID: PMC6486472 DOI: 10.1093/gigascience/giz044] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 03/03/2019] [Accepted: 03/28/2019] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. FINDINGS SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. CONCLUSIONS SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
Collapse
|
18
|
A Serverless Tool for Platform Agnostic Computational Experiment Management. Front Neuroinform 2019; 13:12. [PMID: 30890927 PMCID: PMC6411646 DOI: 10.3389/fninf.2019.00012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 02/15/2019] [Indexed: 01/22/2023] Open
Abstract
Neuroscience has been carried into the domain of big data and high performance computing (HPC) on the backs of initiatives in data collection and an increasingly compute-intensive tools. While managing HPC experiments requires considerable technical acumen, platforms, and standards have been developed to ease this burden on scientists. While web-portals make resources widely accessible, data organizations such as the Brain Imaging Data Structure and tool description languages such as Boutiques provide researchers with a foothold to tackle these problems using their own datasets, pipelines, and environments. While these standards lower the barrier to adoption of HPC and cloud systems for neuroscience applications, they still require the consolidation of disparate domain-specific knowledge. We present Clowdr, a lightweight tool to launch experiments on HPC systems and clouds, record rich execution records, and enable the accessible sharing and re-launch of experimental summaries and results. Clowdr uniquely sits between web platforms and bare-metal applications for experiment management by preserving the flexibility of do-it-yourself solutions while providing a low barrier for developing, deploying and disseminating neuroscientific analysis.
Collapse
|
19
|
Integration of "omics" Data and Phenotypic Data Within a Unified Extensible Multimodal Framework. Front Neuroinform 2018; 12:91. [PMID: 30631270 PMCID: PMC6315165 DOI: 10.3389/fninf.2018.00091] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 11/16/2018] [Indexed: 12/11/2022] Open
Abstract
Analysis of “omics” data is often a long and segmented process, encompassing multiple stages from initial data collection to processing, quality control and visualization. The cross-modal nature of recent genomic analyses renders this process challenging to both automate and standardize; consequently, users often resort to manual interventions that compromise data reliability and reproducibility. This in turn can produce multiple versions of datasets across storage systems. As a result, scientists can lose significant time and resources trying to execute and monitor their analytical workflows and encounter difficulties sharing versioned data. In 2015, the Ludmer Centre for Neuroinformatics and Mental Health at McGill University brought together expertise from the Douglas Mental Health University Institute, the Lady Davis Institute and the Montreal Neurological Institute (MNI) to form a genetics/epigenetics working group. The objectives of this working group are to: (i) design an automated and seamless process for (epi)genetic data that consolidates heterogeneous datasets into the LORIS open-source data platform; (ii) streamline data analysis; (iii) integrate results with provenance information; and (iv) facilitate structured and versioned sharing of pipelines for optimized reproducibility using high-performance computing (HPC) environments via the CBRAIN processing portal. This article outlines the resulting generalizable “omics” framework and its benefits, specifically, the ability to: (i) integrate multiple types of biological and multi-modal datasets (imaging, clinical, demographics and behavioral); (ii) automate the process of launching analysis pipelines on HPC platforms; (iii) remove the bioinformatic barriers that are inherent to this process; (iv) ensure standardization and transparent sharing of processing pipelines to improve computational consistency; (v) store results in a queryable web interface; (vi) offer visualization tools to better view the data; and (vii) provide the mechanisms to ensure usability and reproducibility. This framework for workflows facilitates brain research discovery by reducing human error through automation of analysis pipelines and seamless linking of multimodal data, allowing investigators to focus on research instead of data handling.
Collapse
|