Canakoglu A, Pinoli P, Gulino A, Nanni L, Masseroli M, Ceri S. Federated sharing and processing of genomic datasets for tertiary data analysis.
Brief Bioinform 2020;
22:5868062. [PMID:
34020536 DOI:
10.1093/bib/bbaa091]
[Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 04/05/2020] [Accepted: 04/27/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION
With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing.
RESULTS
A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized.
AVAILABILITY
The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/.
CONTACT
{arif.canakoglu, pietro.pinoli}@polimi.it.
SUMMARY
Collapse