1
|
Brauneck A, Schmalhorst L, Weiss S, Baumbach L, Völker U, Ellinghaus D, Baumbach J, Buchholtz G. Legal aspects of privacy-enhancing technologies in genome-wide association studies and their impact on performance and feasibility. Genome Biol 2024; 25:154. [PMID: 38872191 PMCID: PMC11170858 DOI: 10.1186/s13059-024-03296-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/03/2024] [Indexed: 06/15/2024] Open
Abstract
Genomic data holds huge potential for medical progress but requires strict safety measures due to its sensitive nature to comply with data protection laws. This conflict is especially pronounced in genome-wide association studies (GWAS) which rely on vast amounts of genomic data to improve medical diagnoses. To ensure both their benefits and sufficient data security, we propose a federated approach in combination with privacy-enhancing technologies utilising the findings from a systematic review on federated learning and legal regulations in general and applying these to GWAS.
Collapse
Affiliation(s)
- Alissa Brauneck
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany.
| | - Louisa Schmalhorst
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| | - Stefan Weiss
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Linda Baumbach
- Department of Health Economics and Health Services Research, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology (IKMB), Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gabriele Buchholtz
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| |
Collapse
|
2
|
Wang X, Dervishi L, Li W, Ayday E, Jiang X, Vaidya J. Privacy-preserving federated genome-wide association studies via dynamic sampling. Bioinformatics 2023; 39:btad639. [PMID: 37856329 PMCID: PMC10612407 DOI: 10.1093/bioinformatics/btad639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/15/2023] [Accepted: 10/18/2023] [Indexed: 10/21/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. RESULTS This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/amioamo/TDS.
Collapse
Affiliation(s)
- Xinyue Wang
- Management Science and Information Systems Department, Rutgers University, New Brunswick, NJ 07102, United States
| | - Leonard Dervishi
- Department of Computer and Data Sciences, Cleveland, OH 44106, United States
| | - Wentao Li
- Department of Health Data Science and Artificial Intelligence, Houston, TX 77030, United States
| | - Erman Ayday
- Department of Computer and Data Sciences, Cleveland, OH 44106, United States
| | - Xiaoqian Jiang
- Department of Health Data Science and Artificial Intelligence, Houston, TX 77030, United States
| | - Jaideep Vaidya
- Management Science and Information Systems Department, Rutgers University, New Brunswick, NJ 07102, United States
| |
Collapse
|
3
|
Elsayed WM, Elmogy M, El-Desouky BS. DNA sequence reconstruction based on innovated hybridization technique of probabilistic cellular automata and particle swarm optimization. Inf Sci (N Y) 2020; 547:828-840. [PMID: 32895580 PMCID: PMC7467128 DOI: 10.1016/j.ins.2020.08.102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/24/2020] [Accepted: 08/27/2020] [Indexed: 11/24/2022]
Abstract
DNA sequence reconstruction is a challenging research problem in the computational biology field. The evolution of the DNA is too complex to be characterized by a few parameters. Therefore, there is a need for a modeling approach for analyzing DNA patterns. In this paper, we proposed a novel framework for DNA pattern analysis. The proposed framework consists of two main stages. The first stage is for analyzing the DNA sequences evolution, whereas the other stage is for the reconstruction process. We utilized cellular automata (CA) rules for analyzing and predicting the DNA sequence. Then, a modified procedure for the reconstruction process is introduced, which is based on the Probabilistic Cellular Automata (PCA) integrated with Particle Swarm Optimization (PSO) algorithm. This integration makes the proposed framework more efficient and achieves optimum transition rules. Our innovated model leans on the hypothesis that mutations are probabilistic events. As a result, their evolution can be simulated as a PCA model. The main objective of this paper is to analyze various DNA sequences to predict the changes that occur in DNA during evolution (mutations). We used a similarity score as a fitness measure to detect symmetry relations, which is appropriate for numerous extremely long sequences. Results are given for the CpG-methylation-deamination processes, which are regions of DNA where a guanine nucleotide follows a cytosine nucleotide in the linear sequence of bases. The DNA evolution is handled as the evolved colored paradigms. Therefore, incorporating probabilistic components help to produce a tool capable of foretelling the likelihood of specific mutations. Besides, it shows their capabilities in dealing with complex relations.
Collapse
Affiliation(s)
- Wesam M Elsayed
- Mathematics Dept., Faculty of Science, Mansoura University, Mansoura, Egypt
| | - Mohammed Elmogy
- Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - B S El-Desouky
- Mathematics Dept., Faculty of Science, Mansoura University, Mansoura, Egypt
| |
Collapse
|
4
|
Computational Modeling of Proteins based on Cellular Automata: A Method of HP Folding Approximation. Protein J 2018; 37:248-260. [PMID: 29802509 DOI: 10.1007/s10930-018-9771-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic-hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H-H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H's at even positions and the other side ignores H's at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H-H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.
Collapse
|
5
|
Application of local rules and cellular automata in representing protein translation and enhancing protein folding approximation. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-0146-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
6
|
Swaminathan V, Rajaram G, Abhishek V, Reddy BS, Kannan K. A Novel Hypergraph-Based Genetic Algorithm (HGGA) Built on Unimodular and Anti-homomorphism Properties for DNA Sequencing by Hybridization. Interdiscip Sci 2017; 11:397-411. [PMID: 29110287 DOI: 10.1007/s12539-017-0267-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Revised: 10/11/2017] [Accepted: 10/16/2017] [Indexed: 11/29/2022]
Abstract
The sequencing by hybridization (SBH) of determining the order in which nucleotides should occur on a DNA string is still under discussion for enhancements on computational intelligence although the next generation of DNA sequencing has come into existence. In the last decade, many works related to graph theory-based DNA sequencing have been carried out in the literature. This paper proposes a method for SBH by integrating hypergraph with genetic algorithm (HGGA) for designing a novel analytic technique to obtain DNA sequence from its spectrum. The paper represents elements of the spectrum and its relation as hypergraph and applies the unimodular property to ensure the compatibility of relations between l-mers. The hypergraph representation and unimodular property are bound with the genetic algorithm that has been customized with a novel selection and crossover operator reducing the computational complexity with accelerated convergence. Subsequently, upon determining the primary strand, an anti-homomorphism is invoked to find the reverse complement of the sequence. The proposed algorithm is implemented in the GenBank BioServer datasets, and the results are found to prove the efficiency of the algorithm. The HGGA is a non-classical algorithm with significant advantages and computationally attractive complexity reductions ranging to [Formula: see text] with improved accuracy that makes it prominent for applications other than DNA sequencing like image processing, task scheduling and big data processing.
Collapse
Affiliation(s)
- V Swaminathan
- Discrete Mathematics Research Laboratory, Srinivasa Ramanujan Centre, SASTRA University, Thanjavur, India.,School of Humanities and Sciences, SASTRA University, Thanjavur, India
| | - Gangothri Rajaram
- School of Computing, SASTRA University, Thanjavur, Tamilnadu, India. .,School of Humanities and Sciences, SASTRA University, Thanjavur, India.
| | - V Abhishek
- School of Computing, SASTRA University, Thanjavur, Tamilnadu, India.,School of Humanities and Sciences, SASTRA University, Thanjavur, India
| | - Boosi Shashank Reddy
- School of Computing, SASTRA University, Thanjavur, Tamilnadu, India.,School of Humanities and Sciences, SASTRA University, Thanjavur, India
| | - K Kannan
- Discrete Mathematics Research Laboratory, Srinivasa Ramanujan Centre, SASTRA University, Thanjavur, India. .,School of Computing, SASTRA University, Thanjavur, Tamilnadu, India. .,School of Humanities and Sciences, SASTRA University, Thanjavur, India.
| |
Collapse
|
7
|
Towards implementation of cellular automata in Microbial Fuel Cells. PLoS One 2017; 12:e0177528. [PMID: 28498871 PMCID: PMC5428934 DOI: 10.1371/journal.pone.0177528] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 04/29/2017] [Indexed: 11/19/2022] Open
Abstract
The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Collapse
|
8
|
Li J, Liew TCH. Cellular automata in photonic cavity arrays. OPTICS EXPRESS 2016; 24:24930-24937. [PMID: 27828433 DOI: 10.1364/oe.24.024930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We propose theoretically a photonic Turing machine based on cellular automata in arrays of nonlinear cavities coupled with artificial gauge fields. The state of the system is recorded making use of the bistability of driven cavities, in which losses are fully compensated by an external continuous drive. The sequential update of the automaton layers is achieved automatically, by the local switching of bistable states, without requiring any additional synchronization or temporal control.
Collapse
|
9
|
Dourvas N, Tsompanas MA, Sirakoulis GC, Tsalides P. Hardware Acceleration of Cellular Automata Physarum polycephalum Model. ACTA ACUST UNITED AC 2015. [DOI: 10.1142/s012962641540006x] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
During the past decades, computer science experts were inspired from the study of biological organisms. Moreover, bio-inspired algorithms were produced that many times can give excellent solutions with low computational cost in complex engineering problems. In our case, the plasmodium of Physarum polycephalum is capable of finding the shortest path solution between two points in a labyrinth. In this study, we implement a Cellular Automata (CA) model in hardware, which attempts to describe and, moreover, mimic the behavior of the plasmodium in a maze. Beyond the successful implementation of the CA-based Physarum model in software, in order to take full advantage of the inherent parallelism of CA, we focus on a Field Programmable Gate Array (FPGA) implementation of the proposed model. Namely, two different implementations were considered here. Their difference is on the desired precision produced by the numerical representation of CA model parameters. Based on the corresponding results of the shortest path in the labyrinth,the modeling efficiency of both approaches was compared depending on the resulting error propagation. The presented FPGA implementations succeed to take advantage of the CA's inherit parallelism and improve the performance of the CA algorithm when compared with software in terms of computational speed and power consumption. As a result, the implementations presented here, can also be considered as a preliminary CA-based Physarum polycephalum IP core which produces a biological inspired solution to the shortest-path problem.
Collapse
Affiliation(s)
- Nikolaos Dourvas
- Department of Electrical and Computer Engineering, Democritus University of Thrace Xanthi, GR 67100, Greece
| | | | - Georgios Ch. Sirakoulis
- Department of Electrical and Computer Engineering, Democritus University of Thrace Xanthi, GR 67100, Greece
| | - Philippos Tsalides
- Department of Electrical and Computer Engineering, Democritus University of Thrace Xanthi, GR 67100, Greece
| |
Collapse
|
10
|
|
11
|
Tsompanas MAI, Sirakoulis GC. Modeling and hardware implementation of an amoeba-like cellular automaton. BIOINSPIRATION & BIOMIMETICS 2012; 7:036013. [PMID: 22570143 DOI: 10.1088/1748-3182/7/3/036013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Over the last few years, an increasing number of publications has shown that living organisms are very effective in finding solutions to complex mathematical problems which usually demand large computation resources. The plasmodium of the slime mould Physarum polycephalum is a successful example that has been used to solve path-finding problems on graphs and combinatorial problems. Cellular automata (CAs) computational model can capture the essential features of systems in which global behavior emerges from the collective effect of simple components, which interact locally (emergent computation). We developed a CA that models exactly the Physarum's behavior and applied it in finding the minimum-length path between two points in a labyrinth, as well as in solving a path-planning problem by guiding the development of adaptive networks, as in the case of the actual rail network of Tokyo. The CA results are in very good agreement with the computation results produced by the living organism experiments in both cases. Moreover, our CA hardware implementation results in faster and more effective computation performance, because of its inherent parallel nature. Consequently, our CA, implemented both in software and hardware, can serve as a powerful and low-cost virtual laboratory that models the slime mould Physarum's computation behavior.
Collapse
Affiliation(s)
- Michail-Antisthenis I Tsompanas
- Department of Electrical and Computer Engineering, Democritus University of Thrace, University Campus, Xanthi GR-67100, Greece.
| | | |
Collapse
|
12
|
D'Onofrio DJ, Abel DL, Johnson DE. Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems. Theor Biol Med Model 2012; 9:8. [PMID: 22413926 PMCID: PMC3319427 DOI: 10.1186/1742-4682-9-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 03/14/2012] [Indexed: 11/26/2022] Open
Abstract
The fields of molecular biology and computer science have cooperated over recent years to create a synergy between the cybernetic and biosemiotic relationship found in cellular genomics to that of information and language found in computational systems. Biological information frequently manifests its "meaning" through instruction or actual production of formal bio-function. Such information is called Prescriptive Information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI. This paper looks at this dichotomy as expressed in both the genetic code and in the central dogma of protein synthesis. An example of a genetic algorithm is modeled after the ribosome, and an examination of the protein synthesis process is used to differentiate PI data from PI algorithms.
Collapse
Affiliation(s)
- David J D'Onofrio
- Control Systems Modeling and Simulation, General Dynamics, Sterling Heights MI, USA.
| | | | | |
Collapse
|
13
|
Warden AC, Little BA, Haritos VS. A cellular automaton model of crystalline cellulose hydrolysis by cellulases. BIOTECHNOLOGY FOR BIOFUELS 2011; 4:39. [PMID: 22005054 PMCID: PMC3214134 DOI: 10.1186/1754-6834-4-39] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 10/17/2011] [Indexed: 05/07/2023]
Abstract
BACKGROUND Cellulose from plant biomass is an abundant, renewable material which could be a major feedstock for low emissions transport fuels such as cellulosic ethanol. Cellulase enzymes that break down cellulose into fermentable sugars are composed of different types - cellobiohydrolases I and II, endoglucanase and β-glucosidase - with separate functions. They form a complex interacting network between themselves, soluble hydrolysis product molecules, solution and solid phase substrates and inhibitors. There have been many models proposed for enzymatic saccharification however none have yet employed a cellular automaton approach, which allows important phenomena, such as enzyme crowding on the surface of solid substrates, denaturation and substrate inhibition, to be considered in the model. RESULTS The Cellulase 4D model was developed de novo taking into account the size and composition of the substrate and surface-acting enzymes were ascribed behaviors based on their movements, catalytic activities and rates, affinity for, and potential for crowding of, the cellulose surface, substrates and inhibitors, and denaturation rates. A basic case modeled on literature-derived parameters obtained from Trichoderma reesei cellulases resulted in cellulose hydrolysis curves that closely matched curves obtained from published experimental data. Scenarios were tested in the model, which included variation of enzyme loadings, adsorption strengths of surface acting enzymes and reaction periods, and the effect on saccharide production over time was assessed. The model simulations indicated an optimal enzyme loading of between 0.5 and 2 of the base case concentrations where a balance was obtained between enzyme crowding on the cellulose crystal, and that the affinities of enzymes for the cellulose surface had a large effect on cellulose hydrolysis. In addition, improvements to the cellobiohydrolase I activity period substantially improved overall glucose production. CONCLUSIONS Cellulase 4D simulates the enzymatic hydrolysis of cellulose to glucose by surface and solution phase-acting enzymes and accounts for complex phenomena that have previously not been included in cellulose hydrolysis models. The model is intended as a tool for industry, researchers and educators alike to explore options for enzyme engineering and process development and to test hypotheses regarding cellulase mechanisms.
Collapse
Affiliation(s)
- Andrew C Warden
- CSIRO Energy Transformed Flagship and CSIRO Ecosystems Sciences, PO Box 1700, Canberra, Australian Capital Territory 2601, Australia
| | - Bryce A Little
- CSIRO Livestock Industries, FD McMaster Laboratory, Armidale, New South Wales 2350, Australia
- CSIRO Livestock Industries, Queensland Biosciences Precinct, 306 Carmody Road, St Lucia, Queensland 4067, Australia
| | - Victoria S Haritos
- CSIRO Energy Transformed Flagship and CSIRO Ecosystems Sciences, PO Box 1700, Canberra, Australian Capital Territory 2601, Australia
| |
Collapse
|
14
|
Evaluation of dynamic behavior forecasting parameters in the process of transition rule induction of unidimensional cellular automata. Biosystems 2009; 99:6-16. [PMID: 19686802 DOI: 10.1016/j.biosystems.2009.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2009] [Revised: 08/03/2009] [Accepted: 08/06/2009] [Indexed: 11/20/2022]
Abstract
The simulation of the dynamics of a cellular systems based on cellular automata (CA) can be computationally expensive. This is particularly true when such simulation is part of a procedure of rule induction to find suitable transition rules for the CA. Several efforts have been described in the literature to make this problem more treatable. This work presents a study about the efficiency of dynamic behavior forecasting parameters (DBFPs) used for the induction of transition rules of CA for a specific problem: the classification by the majority rule. A total of 8 DBFPs were analyzed for the 31 best-performing rules found in the literature. Some of these DBFPs were highly correlated each other, meaning they yield the same information. Also, most rules presented values of the DBFPs very close each other. An evolutionary algorithm, based on gene expression programming, was developed for finding transition rules according a given preestablished behavior. The simulation of the dynamic behavior of the CA is not used to evaluate candidate transition rules. Instead, the average values for the DBFPs were used as reference. Experiments were done using the DBFPs separately and together. In both cases, the best induced transition rules were not acceptable solutions for the desired behavior of the CA. We conclude that, although the DBFPs represent interesting aspects of the dynamic behavior of CAs, the transition rule induction process still requires the simulation of the dynamics and cannot rely only on the DBFPs.
Collapse
|