1
|
Welzel M, Schwarz PM, Löchel HF, Kabdullayeva T, Clemens S, Becker A, Freisleben B, Heider D. DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage. Nat Commun 2023; 14:628. [PMID: 36746948 PMCID: PMC9902613 DOI: 10.1038/s41467-023-36297-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 01/25/2023] [Indexed: 02/08/2023] Open
Abstract
The extensive information capacity of DNA, coupled with decreasing costs for DNA synthesis and sequencing, makes DNA an attractive alternative to traditional data storage. The processes of writing, storing, and reading DNA exhibit specific error profiles and constraints DNA sequences have to adhere to. We present DNA-Aeon, a concatenated coding scheme for DNA data storage. It supports the generation of variable-sized encoded sequences with a user-defined Guanine-Cytosine (GC) content, homopolymer length limitation, and the avoidance of undesired motifs. It further enables users to provide custom codebooks adhering to further constraints. DNA-Aeon can correct substitution errors, insertions, deletions, and the loss of whole DNA strands. Comparisons with other codes show better error-correction capabilities of DNA-Aeon at similar redundancy levels with decreased DNA synthesis costs. In-vitro tests indicate high reliability of DNA-Aeon even in the case of skewed sequencing read distributions and high read-dropout.
Collapse
Affiliation(s)
- Marius Welzel
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Peter Michael Schwarz
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Hannah F Löchel
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Tolganay Kabdullayeva
- Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Sandra Clemens
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Bernd Freisleben
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany. .,Center for Synthetic Microbiology (SYNMIKRO), University of Marburg, Marburg, Germany.
| |
Collapse
|
2
|
Löchel HF, Welzel M, Hattab G, Hauschild AC, Heider D. Fractal construction of constrained code words for DNA storage systems. Nucleic Acids Res 2021; 50:e30. [PMID: 34908135 PMCID: PMC8934655 DOI: 10.1093/nar/gkab1209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 11/16/2021] [Accepted: 11/24/2021] [Indexed: 12/29/2022] Open
Abstract
The use of complex biological molecules to solve computational problems is an emerging field at the interface between biology and computer science. There are two main categories in which biological molecules, especially DNA, are investigated as alternatives to silicon-based computer technologies. One is to use DNA as a storage medium, and the other is to use DNA for computing. Both strategies come with certain constraints. In the current study, we present a novel approach derived from chaos game representation for DNA to generate DNA code words that fulfill user-defined constraints, namely GC content, homopolymers, and undesired motifs, and thus, can be used to build codes for reliable DNA storage systems.
Collapse
Affiliation(s)
- Hannah F Löchel
- Department of Mathematics and Computer Science, University of Marburg, Germany
| | - Marius Welzel
- Department of Mathematics and Computer Science, University of Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, University of Marburg, Germany
| | | | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marburg, Germany
| |
Collapse
|
3
|
Hufsky F, Lamkiewicz K, Almeida A, Aouacheria A, Arighi C, Bateman A, Baumbach J, Beerenwinkel N, Brandt C, Cacciabue M, Chuguransky S, Drechsel O, Finn RD, Fritz A, Fuchs S, Hattab G, Hauschild AC, Heider D, Hoffmann M, Hölzer M, Hoops S, Kaderali L, Kalvari I, von Kleist M, Kmiecinski R, Kühnert D, Lasso G, Libin P, List M, Löchel HF, Martin MJ, Martin R, Matschinske J, McHardy AC, Mendes P, Mistry J, Navratil V, Nawrocki EP, O’Toole ÁN, Ontiveros-Palacios N, Petrov AI, Rangel-Pineros G, Redaschi N, Reimering S, Reinert K, Reyes A, Richardson L, Robertson DL, Sadegh S, Singer JB, Theys K, Upton C, Welzel M, Williams L, Marz M. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research. Brief Bioinform 2021; 22:642-663. [PMID: 33147627 PMCID: PMC7665365 DOI: 10.1093/bib/bbaa232] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 07/28/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022] Open
Abstract
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Christian Brandt
- Institute of Infectious Disease and Infection Control at Jena University Hospital, Germany
| | - Marco Cacciabue
- Consejo Nacional de Investigaciones Científicas y Tócnicas (CONICET) working on FMDV virology at the Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET) and at the Departamento de Ciencias Básicas, Universidad Nacional de Luján (UNLu), Argentina
| | | | - Oliver Drechsel
- bioinformatics department at the Robert Koch-Institute, Germany
| | | | - Adrian Fritz
- Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research, Germany
| | - Stephan Fuchs
- bioinformatics department at the Robert Koch-Institute, Germany
| | - Georges Hattab
- Bioinformatics Division at Philipps-University Marburg, Germany
| | | | - Dominik Heider
- Data Science in Biomedicine at the Philipps-University of Marburg, Germany
| | | | | | - Stefan Hoops
- Biocomplexity Institute and Initiative at the University of Virginia, USA
| | - Lars Kaderali
- Bioinformatics and head of the Institute of Bioinformatics at University Medicine Greifswald, Germany
| | | | - Max von Kleist
- bioinformatics department at the Robert Koch-Institute, Germany
| | - Renó Kmiecinski
- bioinformatics department at the Robert Koch-Institute, Germany
| | | | - Gorka Lasso
- Chandran Lab, Albert Einstein College of Medicine, USA
| | | | | | | | | | | | | | - Alice C McHardy
- Computational Biology of Infection Research Lab at the Helmholtz Centre for Infection Research in Braunschweig, Germany
| | - Pedro Mendes
- Center for Quantitative Medicine of the University of Connecticut School of Medicine, USA
| | | | - Vincent Navratil
- Bioinformatics and Systems Biology at the Rhône Alpes Bioinformatics core facility, Universitó de Lyon, France
| | | | | | | | | | | | - Nicole Redaschi
- Development of the Swiss-Prot group at the SIB for UniProt and SIB resources that cover viral biology (ViralZone)
| | - Susanne Reimering
- Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research
| | | | | | | | | | - Sepideh Sadegh
- Chair of Experimental Bioinformatics at Technical University of Munich, Germany
| | - Joshua B Singer
- MRC-University of Glasgow Centre for Virus Research, Glasgow, Scotland, UK
| | | | - Chris Upton
- Department of Biochemistry and Microbiology, University of Victoria, Canada
| | | | | | - Manja Marz
- Friedrich Schiller University Jena, Germany
| |
Collapse
|
4
|
Martin R, Löchel HF, Welzel M, Hattab G, Hauschild AC, Heider D. CORDITE: The Curated CORona Drug InTERactions Database for SARS-CoV-2. iScience 2020; 23:101297. [PMID: 32619700 PMCID: PMC7305714 DOI: 10.1016/j.isci.2020.101297] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 06/09/2020] [Accepted: 06/15/2020] [Indexed: 01/18/2023] Open
Abstract
Since the outbreak in 2019, researchers are trying to find effective drugs against the SARS-CoV-2 virus based on de novo drug design and drug repurposing. The former approach is very time consuming and needs extensive testing in humans, whereas drug repurposing is more promising, as the drugs have already been tested for side effects, etc. At present, there is no treatment for COVID-19 that is clinically effective, but there is a huge amount of data from studies that analyze potential drugs. We developed CORDITE to efficiently combine state-of-the-art knowledge on potential drugs and make it accessible to scientists and clinicians. The web interface also provides access to an easy-to-use API that allows a wide use for other software and applications, e.g., for meta-analysis, design of new clinical studies, or simple literature search. CORDITE is currently empowering many scientists across all continents and accelerates research in the knowledge domains of virology and drug design.
Collapse
Affiliation(s)
- Roman Martin
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Hannah F Löchel
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Marius Welzel
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Hans-Meerwein-Str. 6, 35032 Marburg, Germany.
| |
Collapse
|
5
|
Löchel HF, Riemenschneider M, Frishman D, Heider D. SCOTCH: subtype A coreceptor tropism classification in HIV-1. Bioinformatics 2019; 34:2575-2580. [PMID: 29554213 DOI: 10.1093/bioinformatics/bty170] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/14/2018] [Indexed: 01/25/2023] Open
Abstract
Motivation The V3 loop of the gp120 glycoprotein of the Human Immunodeficiency Virus 1 (HIV-1) is considered to be responsible for viral coreceptor tropism. gp120 interacts with the CD4 receptor of the host cell and subsequently V3 binds either CCR5 or CXCR4. Due to the fact that the CCR5 coreceptor is targeted by entry inhibitors, a reliable prediction of the coreceptor usage of HIV-1 is of great interest for antiretroviral therapy. Although several methods for the prediction of coreceptor tropism are available, almost all of them have been developed based on only subtype B sequences, and it has been shown in several studies that the prediction of non-B sequences, in particular subtype A sequences, are less reliable. Thus, the aim of the current study was to develop a reliable prediction model for subtype A viruses. Results Our new model SCOTCH is based on a stacking approach of classifier ensembles and shows a significantly better performance for subtype A sequences compared to other available models. In particular for low false positive rates (between 0.05 and 0.2, i.e. recommendation in the German and European Guidelines for tropism prediction), SCOTCH shows significantly better prediction performances in terms of partial area under the curves and diagnostic odds ratios compared to existing tools, and thus can be used to reliably predict coreceptor tropism for subtype A sequences. Availability and implementation SCOTCH can be downloaded/accessed at http://www.heiderlab.de.
Collapse
Affiliation(s)
- Hannah F Löchel
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | | | - Dmitrij Frishman
- Department of Genome-Oriented Bioinformatics, Technical University of Munich, Freising, Germany.,Laboratory of Bioinformatics, St. Petersburg State Polytechnic University, St. Petersburg, Russia
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|