1
|
Zhang R, Wu H. On secondary structure avoidance of codes for DNA storage. Comput Struct Biotechnol J 2024; 23:140-147. [PMID: 38146435 PMCID: PMC10749251 DOI: 10.1016/j.csbj.2023.11.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/27/2023] Open
Abstract
A secondary structure in single-stranded DNA refers to its propensity to undergo self-folding, leading to functional inactivity and irreparable failures within DNA storage systems. Consequently, the property of secondary structure avoidance (SSA) becomes a crucial criterion in the design of single-stranded DNA sequences for DNA storage, as it prohibits the inclusion of reverse-complement subsequences that contribute to such structures. This work is specifically focused on addressing the avoidance of secondary structures in single-stranded DNA sequences. We propose a novel sequence replacement approach, which successfully resolves the SSA problem under conditions where the stem exceeds a length of 2 log 2 n + 2 , and the loop is of length k ≥ 4 . These parameters have been carefully chosen to closely resemble the real-world scenarios encountered in biochemical processes, enhancing the practical relevance of our study.
Collapse
Affiliation(s)
- Rui Zhang
- Chern Institute of Mathematics, Nankai University, Tianjin, 300071, China
| | - Huaming Wu
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
2
|
Rasool A, Hong J, Hong Z, Li Y, Zou C, Chen H, Qu Q, Wang Y, Jiang Q, Huang X, Dai J. An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. SMALL METHODS 2024:e2301585. [PMID: 38807543 DOI: 10.1002/smtd.202301585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/29/2024] [Indexed: 05/30/2024]
Abstract
DNA-based data storage is a new technology in computational and synthetic biology, that offers a solution for long-term, high-density data archiving. Given the critical importance of medical data in advancing human health, there is a growing interest in developing an effective medical data storage system based on DNA. Data integrity, accuracy, reliability, and efficient retrieval are all significant concerns. Therefore, this study proposes an Effective DNA Storage (EDS) approach for archiving medical MRI data. The EDS approach incorporates three key components (i) a novel fraction strategy to address the critical issue of rotating encoding, which often leads to data loss due to single base error propagation; (ii) a novel rule-based quaternary transcoding method that satisfies bio-constraints and ensure reliable mapping; and (iii) an indexing technique designed to simplify random search and access. The effectiveness of this approach is validated through computer simulations and biological experiments, confirming its practicality. The EDS approach outperforms existing methods, providing superior control over bio-constraints and reducing computational time. The results and code provided in this study open new avenues for practical DNA storage of medical MRI data, offering promising prospects for the future of medical data archiving and retrieval.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jingwei Hong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Zhiling Hong
- Quanzhou Development Group Co., Ltd, Quanzhou, 362000, China
| | - Yuanzhen Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Chao Zou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518055, China
| |
Collapse
|
3
|
Yu M, Tang X, Li Z, Wang W, Wang S, Li M, Yu Q, Xie S, Zuo X, Chen C. High-throughput DNA synthesis for data storage. Chem Soc Rev 2024; 53:4463-4489. [PMID: 38498347 DOI: 10.1039/d3cs00469d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
With the explosion of digital world, the dramatically increasing data volume is expected to reach 175 ZB (1 ZB = 1012 GB) in 2025. Storing such huge global data would consume tons of resources. Fortunately, it has been found that the deoxyribonucleic acid (DNA) molecule is the most compact and durable information storage medium in the world so far. Its high coding density and long-term preservation properties make itself one of the best data storage carriers for the future. High-throughput DNA synthesis is a key technology for "DNA data storage", which encodes binary data stream (0/1) into quaternary long DNA sequences consisting of four bases (A/G/C/T). In this review, the workflow of DNA data storage and the basic methods of artificial DNA synthesis technology are outlined first. Then, the technical characteristics of different synthesis methods and the state-of-the-art of representative commercial companies, with a primary focus on silicon chip microarray-based synthesis and novel enzymatic DNA synthesis are presented. Finally, the recent status of DNA storage and new opportunities for future development in the field of high-throughput, large-scale DNA synthesis technology are summarized.
Collapse
Affiliation(s)
- Meng Yu
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaohui Tang
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Zhenhua Li
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Weidong Wang
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Shaopeng Wang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Min Li
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Qiuliyang Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
| | - Sijia Xie
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Chang Chen
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, China
| |
Collapse
|
4
|
Ben Shabat D, Hadad A, Boruchovsky A, Yaakobi E. GradHC: highly reliable gradual hash-based clustering for DNA storage systems. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae274. [PMID: 38648049 DOI: 10.1093/bioinformatics/btae274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/27/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024]
Abstract
MOTIVATION As data storage challenges grow and existing technologies approach their limits, synthetic DNA emerges as a promising storage solution due to its remarkable density and durability advantages. While cost remains a concern, emerging sequencing and synthetic technologies aim to mitigate it, yet introduce challenges such as errors in the storage and retrieval process. One crucial task in a DNA storage system is clustering numerous DNA reads into groups that represent the original input strands. RESULTS In this paper, we review different methods for evaluating clustering algorithms and introduce a novel clustering algorithm for DNA storage systems, named Gradual Hash-based clustering (GradHC). The primary strength of GradHC lies in its capability to cluster with excellent accuracy various types of designs, including varying strand lengths, cluster sizes (including extremely small clusters), and different error ranges. Benchmark analysis demonstrates that GradHC is significantly more stable and robust than other clustering algorithms previously proposed for DNA storage, while also producing highly reliable clustering results. AVAILABILITY AND IMPLEMENTATION https://github.com/bensdvir/GradHC.
Collapse
Affiliation(s)
- Dvir Ben Shabat
- Department of Computer Science, Technion, Haifa 320003, Israel
| | - Adar Hadad
- Department of Computer Science, Technion, Haifa 320003, Israel
| | | | - Eitan Yaakobi
- Department of Computer Science, Technion, Haifa 320003, Israel
| |
Collapse
|
5
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
6
|
Berleant JD, Banal JL, Rao DK, Bathe M. Scalable search of massively pooled nucleic acid samples enabled by a molecular database query language. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.12.24305660. [PMID: 38699348 PMCID: PMC11064994 DOI: 10.1101/2024.04.12.24305660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
The surge in nucleic acid analytics requires scalable storage and retrieval systems akin to electronic databases used to organize digital data. Such a system could transform disease diagnosis, ecological preservation, and molecular surveillance of biothreats. Current storage systems use individual containers for nucleic acid samples, requiring single-sample retrieval that falls short compared with digital databases that allow complex and combinatorial data retrieval on aggregated data. Here, we leverage protective microcapsules with combinatorial DNA labeling that enables arbitrary retrieval on pooled biosamples analogous to Structured Query Languages. Ninety-six encapsulated pooled mock SARS-CoV-2 genomic samples barcoded with patient metadata are used to demonstrate queries with simultaneous matches to sample collection date ranges, locations, and patient health statuses, illustrating how such flexible queries can be used to yield immunological or epidemiological insights. The approach applies to any biosample database labeled with orthogonal barcodes, enabling complex post-hoc analysis, for example, to study global biothreat epidemiology.
Collapse
Affiliation(s)
- Joseph D. Berleant
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - James L. Banal
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Present address: Cache DNA, Inc. 733 Industrial Rd., San Carlos, CA 94070 USA
| | | | - Mark Bathe
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139 USA
| |
Collapse
|
7
|
Hou Z, Qiang W, Wang X, Chen X, Hu X, Han X, Shen W, Zhang B, Xing P, Shi W, Dai J, Huang X, Zhao G. "Cell Disk" DNA Storage System Capable of Random Reading and Rewriting. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305921. [PMID: 38332565 PMCID: PMC11022697 DOI: 10.1002/advs.202305921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/23/2023] [Indexed: 02/10/2024]
Abstract
DNA has emerged as an appealing material for information storage due to its great storage density and durability. Random reading and rewriting are essential tasks for practical large-scale data storage. However, they are currently difficult to implement simultaneously in a single DNA-based storage system, strongly limiting their practicability. Here, a "Cell Disk" storage system is presented, achieving high-density in vivo DNA data storage that enables both random reading and rewriting. In this system, each yeast cell is used as a chamber to store information, similar to a "disk block" but with the ability to self-replicate. Specifically, each genome of yeast cell has a customized CRISPR/Cas9-based "lock-and-key" module inserted, which allows selective retrieval, erasure, or rewriting of the targeted cell "block" from a pool of cells ("disk"). Additionally, a codec algorithm with lossless compression ability is developed to improve the information density of each cell "block". As a proof of concept, target-specific reading and rewriting of the compressed data from a mimic cell "disk" comprising up to 105 "blocks" are demonstrated and achieve high specificity and reliability. The "Cell Disk" system described here concurrently supports random reading and rewriting, and it should have great scalability for practical data storage use.
Collapse
Affiliation(s)
- Zhaohua Hou
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wei Qiang
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
| | - Xiangxiang Wang
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xiaoxu Chen
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xin Hu
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xuye Han
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wenlu Shen
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Bing Zhang
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Peng Xing
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wenping Shi
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Junbiao Dai
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenP. R. China
| | - Xiaoluo Huang
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
| | - Guanghou Zhao
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| |
Collapse
|
8
|
Gomes CP, Martins AGC, Nunes SE, Ramos B, Wisinewski HR, Reis JLMS, Lima AP, Aoyagi TY, Goncales I, Maia DS, Tunussi AS, Menossi MS, Pereira SM, Turrini PCG, Gervasio JHDB, Verona BM, Cerize NNP. Coding, Decoding and Retrieving a Message Using DNA: An Experience from a Brazilian Center Research on DNA Data Storage. MICROMACHINES 2024; 15:474. [PMID: 38675287 PMCID: PMC11051810 DOI: 10.3390/mi15040474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024]
Abstract
DNA data storage based on synthetic oligonucleotides is a major attraction due to the possibility of storage over long periods. Nowadays, the quantity of data generated has been growing exponentially, and the storage capacity needs to keep pace with the growth caused by new technologies and globalization. Since DNA can hold a large amount of information with a high density and remains stable for hundreds of years, this technology offers a solution for current long-term data centers by reducing energy consumption and physical storage space. Currently, research institutes, technology companies, and universities are making significant efforts to meet the growing need for data storage. DNA data storage is a promising field, especially with the advancement of sequencing techniques and equipment, which now make it possible to read genomes (i.e., to retrieve the information) and process this data easily. To overcome the challenges associated with developing new technologies for DNA data storage, a message encoding and decoding exercise was conducted at a Brazilian research center. The exercise performed consisted of synthesizing oligonucleotides by the phosphoramidite route. An encoded message, using a coding scheme that adheres to DNA sequence constraints, was synthesized. After synthesis, the oligonucleotide was sequenced and decoded, and the information was fully recovered.
Collapse
Affiliation(s)
- Caio P. Gomes
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - André G. C. Martins
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Sabrina E. Nunes
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Bruno Ramos
- Microfluidic & Photoelectrocatalytic Engineering Group, Department of Chemical Engineering, FEI University Center, São Bernardo do Campo 09850-901, SP, Brazil;
| | - Henrique R. Wisinewski
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - João L. M. S. Reis
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Ariel P. Lima
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Thiago Y. Aoyagi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Icaro Goncales
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Danilo S. Maia
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Ariane S. Tunussi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Marília S. Menossi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Sergio M. Pereira
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Paula C. G. Turrini
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - João H. D. B. Gervasio
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Bruno M. Verona
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Natalia N. P. Cerize
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| |
Collapse
|
9
|
Wang K, Cao B, Ma T, Zhao Y, Zheng Y, Wang B, Zhou S, Zhang Q. Storing Images in DNA via base128 Encoding. J Chem Inf Model 2024; 64:1719-1729. [PMID: 38385334 DOI: 10.1021/acs.jcim.3c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Current DNA storage schemes lack flexibility and consistency in processing highly redundant and correlated image data, resulting in low sequence stability and image reconstruction rates. Therefore, according to the characteristics of image storage, this paper proposes storing images in DNA via base128 encoding (DNA-base128). In the data writing stage, data segmentation and probability statistics are carried out, and then, the data block frequency and constraint encoding set are associated with achieving encoding. When the image needs to be recovered, DNA-base128 completes internal error correction by threshold setting and drift comparison. Compared with representative work, the DNA-base128 encoding results show that the undesired motifs were reduced by 71.2-90.7% and that the local guanine-cytosine content variance was reduced by 3 times, indicating that DNA-base128 can store images more stably. In addition, the structural similarity index (SSIM) and multiscale structural similarity (MS-SSIM) of image reconstruction using DNA-base128 were improved by 19-102 and 6.6-20.3%, respectively. In summary, DNA-base128 provides image encoding with internal error correction and provides a potential solution for DNA image storage. The data and code are available at the GitHub repository: https://github.com/123456wk/DNA_base128.
Collapse
Affiliation(s)
- Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Ma
- Brain Function Research Section, China Medical University, Shenyang 110001, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
10
|
Kiryanova OY, Garafutdinov RR, Gubaydullin IM, Chemeris AV. A novel approach to encode melodies in DNA. Biosystems 2024; 237:105136. [PMID: 38316169 DOI: 10.1016/j.biosystems.2024.105136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/17/2023] [Accepted: 02/02/2024] [Indexed: 02/07/2024]
Abstract
DNA data storage has gained more attention last decades. DNA molecules can be used for encoding of non-biological information and as promising carriers due to greater data capacity, higher duration of the storage, and better technical failures stability. Here we propose a new method for encoding of notes and music in DNA. The encoding technique takes into account the duration and tonality of each note, enabling to encode all seven octaves by assigning a nucleotide sequence to each key. A certain set of short sequences is suggested to define the duration of note sound. The proposed method allows to encode more complicated melodies compared to the approach based on Huffman algorithm.
Collapse
Affiliation(s)
- Olga Yu Kiryanova
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Ravil R Garafutdinov
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| | - Irek M Gubaydullin
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Alexey V Chemeris
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| |
Collapse
|
11
|
Yang S, Bögels BWA, Wang F, Xu C, Dou H, Mann S, Fan C, de Greef TFA. DNA as a universal chemical substrate for computing and data storage. Nat Rev Chem 2024; 8:179-194. [PMID: 38337008 DOI: 10.1038/s41570-024-00576-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2024] [Indexed: 02/12/2024]
Abstract
DNA computing and DNA data storage are emerging fields that are unlocking new possibilities in information technology and diagnostics. These approaches use DNA molecules as a computing substrate or a storage medium, offering nanoscale compactness and operation in unconventional media (including aqueous solutions, water-in-oil microemulsions and self-assembled membranized compartments) for applications beyond traditional silicon-based computing systems. To build a functional DNA computer that can process and store molecular information necessitates the continued development of strategies for computing and data storage, as well as bridging the gap between these fields. In this Review, we explore how DNA can be leveraged in the context of DNA computing with a focus on neural networks and compartmentalized DNA circuits. We also discuss emerging approaches to the storage of data in DNA and associated topics such as the writing, reading, retrieval and post-synthesis editing of DNA-encoded data. Finally, we provide insights into how DNA computing can be integrated with DNA data storage and explore the use of DNA for near-memory computing for future information technology and health analysis applications.
Collapse
Affiliation(s)
- Shuo Yang
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Bas W A Bögels
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Can Xu
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Hongjing Dou
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Stephen Mann
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China.
- Centre for Protolife Research and Centre for Organized Matter Chemistry, School of Chemistry, University of Bristol, Bristol, UK.
- Max Planck-Bristol Centre for Minimal Biology, School of Chemistry, University of Bristol, Bristol, UK.
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Tom F A de Greef
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands.
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Molecules and Materials, Radboud University, Nijmegen, The Netherlands.
- Center for Living Technologies, Eindhoven-Wageningen-Utrecht Alliance, Utrecht, The Netherlands.
| |
Collapse
|
12
|
Dramé-Maigné A, Espada R, McCallum G, Sieskind R, Gines G, Rondelez Y. In Vitro Enzyme Self-Selection Using Molecular Programs. ACS Synth Biol 2024; 13:474-484. [PMID: 38206581 DOI: 10.1021/acssynbio.3c00385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Directed evolution provides a powerful route for in vitro enzyme engineering. State-of-the-art techniques functionally screen up to millions of enzyme variants using high throughput microfluidic sorters, whose operation remains technically challenging. Alternatively, in vitro self-selection methods, analogous to in vivo complementation strategies, open the way to even higher throughputs, but have been demonstrated only for a few specific activities. Here, we leverage synthetic molecular networks to generalize in vitro compartmentalized self-selection processes. We introduce a programmable circuit architecture that can link an arbitrary target enzymatic activity to the replication of its encoding gene. Microencapsulation of a bacterial expression library with this autonomous selection circuit results in the single-step and screening-free enrichment of genetic sequences coding for programmed enzymatic phenotypes. We demonstrate the potential of this approach for the nicking enzyme Nt.BstNBI (NBI). We applied autonomous selection conditions to enrich for thermostability or catalytic efficiency, manipulating up to 107 microcompartments and 5 × 105 variants at once. Full gene reads of the libraries using nanopore sequencing revealed detailed mutational activity landscapes, suggesting a key role of electrostatic interactions with DNA in the enzyme's turnover. The most beneficial mutations, identified after a single round of self-selection, provided variants with, respectively, 20 times and 3 °C increased activity and thermostability. Based on a modular molecular programming architecture, this approach does not require complex instrumentation and can be repurposed for other enzymes, including those that are not related to DNA chemistry.
Collapse
Affiliation(s)
- Adèle Dramé-Maigné
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| | - Rocío Espada
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| | - Giselle McCallum
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| | - Rémi Sieskind
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| | - Guillaume Gines
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| | - Yannick Rondelez
- Gulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France
| |
Collapse
|
13
|
Ding L, Wu S, Hou Z, Li A, Xu Y, Feng H, Pan W, Ruan J. Improving error-correcting capability in DNA digital storage via soft-decision decoding. Natl Sci Rev 2024; 11:nwad229. [PMID: 38213525 PMCID: PMC10776348 DOI: 10.1093/nsr/nwad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/03/2023] [Accepted: 08/15/2023] [Indexed: 01/13/2024] Open
Abstract
Error-correcting codes (ECCs) employed in the state-of-the-art DNA digital storage (DDS) systems suffer from a trade-off between error-correcting capability and the proportion of redundancy. To address this issue, in this study, we introduce soft-decision decoding approach into DDS by proposing a DNA-specific error prediction model and a series of novel strategies. We demonstrate the effectiveness of our approach through a proof-of-concept DDS system based on Reed-Solomon (RS) code, named as Derrick. Derrick shows significant improvement in error-correcting capability without involving additional redundancy in both in vitro and in silico experiments, using various sequencing technologies such as Illumina, PacBio and Oxford Nanopore Technology (ONT). Notably, in vitro experiments using ONT sequencing at a depth of 7× reveal that Derrick, compared with the traditional hard-decision decoding strategy, doubles the error-correcting capability of RS code, decreases the proportion of matrices with decoding-failure by 229-fold, and amplifies the potential maximum storage volume by impressive 32 388-fold. Also, Derrick surpasses 'state-of-the-art' DDS systems by comprehensively considering the information density and the minimum sequencing depth required for complete information recovery. Crucially, the soft-decision decoding strategy and key steps of Derrick are generalizable to other ECCs' decoding algorithms.
Collapse
Affiliation(s)
- Lulu Ding
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Shigang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Zhihao Hou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou510642, China
| | - Alun Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Yaping Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Hu Feng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| |
Collapse
|
14
|
Gervasio JHDB, da Costa Oliveira H, da Costa Martins AG, Pesquero JB, Verona BM, Cerize NNP. How close are we to storing data in DNA? Trends Biotechnol 2024; 42:156-167. [PMID: 37673693 DOI: 10.1016/j.tibtech.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/31/2023] [Accepted: 08/04/2023] [Indexed: 09/08/2023]
Abstract
DNA is an intelligent data storage medium due to its stability and high density. It has been used by nature for over 3.5 billion years. Compared with traditional methods, DNA offers better compression and physical density. DNA can retain information for thousands of years. However, challenges exist in scalability, standardization, metadata gathering, biocybersecurity, and specialized tools. Addressing these challenges is crucial for widespread implementation. Collaboration among experts, as well as keeping the future in mind, is needed to unlock the full potential of DNA data storage, which promises low energy costs, high-density storage, and long-term stability.
Collapse
Affiliation(s)
- Joao Henrique Diniz Brandao Gervasio
- Bionanomanufacturing Center, IPT - Institute for Technological Research, Sao Paulo, SP, Brazil; Department of Bioinformatics, UFMG - Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Department of Statistics, University of Oxford, Oxford, UK.
| | | | | | | | - Bruno Marinaro Verona
- Bionanomanufacturing Center, IPT - Institute for Technological Research, Sao Paulo, SP, Brazil
| | | |
Collapse
|
15
|
Wang S, Mao X, Wang F, Zuo X, Fan C. Data Storage Using DNA. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2307499. [PMID: 37800877 DOI: 10.1002/adma.202307499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/01/2023] [Indexed: 10/07/2023]
Abstract
The exponential growth of global data has outpaced the storage capacities of current technologies, necessitating innovative storage strategies. DNA, as a natural medium for preserving genetic information, has emerged as a highly promising candidate for next-generation storage medium. Storing data in DNA offers several advantages, including ultrahigh physical density and exceptional durability. Facilitated by significant advancements in various technologies, such as DNA synthesis, DNA sequencing, and DNA nanotechnology, remarkable progress has been made in the field of DNA data storage over the past decade. However, several challenges still need to be addressed to realize practical applications of DNA data storage. In this review, the processes and strategies of in vitro DNA data storage are first introduced, highlighting recent advancements. Next, a brief overview of in vivo DNA data storage is provided, with a focus on the various writing strategies developed to date. At last, the challenges encountered in each step of DNA data storage are summarized and promising techniques are discussed that hold great promise in overcoming these obstacles.
Collapse
Affiliation(s)
- Shaopeng Wang
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Xiuhai Mao
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chunhai Fan
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
16
|
Sabary O, Yucovich A, Shapira G, Yaakobi E. Reconstruction algorithms for DNA-storage systems. Sci Rep 2024; 14:1951. [PMID: 38263421 PMCID: PMC10806084 DOI: 10.1038/s41598-024-51730-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
Motivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm's estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Alexander Yucovich
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Guy Shapira
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
17
|
Akash A, Bencurova E, Dandekar T. How to make DNA data storage more applicable. Trends Biotechnol 2024; 42:17-30. [PMID: 37591721 DOI: 10.1016/j.tibtech.2023.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/19/2023]
Abstract
The storage of digital data is becoming a worldwide problem. DNA has been recognized as a biological solution due to its ability to store genetic information without alteration over long periods. The first demonstrations of high-capacity long-lasting DNA digital data storage have been shown. However, high storage costs and slow retrieval of the data must be overcome to make DNA data storage more applicable and marketable. Herein, we discuss the issues and recent advances in DNA data storage methods and highlight pathways to make this technology more applicable to real-world digital data storage. We envision that a combination of molecular biology, nanotechnology, novel polymers, electronics, and automation with systematic development will allow DNA data storage sufficient for everyday use.
Collapse
Affiliation(s)
- Aman Akash
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Elena Bencurova
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
18
|
Yeom H, Kim N, Lee AC, Kim J, Kim H, Choi H, Song SW, Kwon S, Choi Y. Highly Accurate Sequence- and Position-Independent Error Profiling of DNA Synthesis and Sequencing. ACS Synth Biol 2023; 12:3567-3577. [PMID: 37961855 PMCID: PMC10729760 DOI: 10.1021/acssynbio.3c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 11/15/2023]
Abstract
A comprehensive error analysis of DNA-stored data during processing, such as DNA synthesis and sequencing, is crucial for reliable DNA data storage. Both synthesis and sequencing errors depend on the sequence and the transition of bases of nucleotides; ignoring either one of the error sources leads to technical challenges in minimizing the error rate. Here, we present a methodology and toolkit that utilizes an oligonucleotide library generated from a 10-base-shifted sequence array, which is individually labeled with unique molecular identifiers, to delineate and profile DNA synthesis and sequencing errors simultaneously. This methodology enables position- and sequence-independent error profiling of both DNA synthesis and sequencing. Using this toolkit, we report base transitional errors in both synthesis and sequencing in general DNA data storage as well as degenerate-base-augmented DNA data storage. The methodology and data presented will contribute to the development of DNA sequence designs with minimal error.
Collapse
Affiliation(s)
- Huiran Yeom
- Division
of Data Science, College of Information and Communication Technology, The University of Suwon, Hwaseong 18323, Republic of Korea
| | - Namphil Kim
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | | | - Jinhyun Kim
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Hamin Kim
- Department
of Interdisciplinary Program for Bioengineering, Seoul National University, Seoul 08826, South Korea
| | - Hansol Choi
- Bio-MAX
Institute, Seoul National University, Seoul 08826, Republic of Korea
| | - Seo Woo Song
- Basic Science
and Engineering Initiative, Children’s Heart Center, Stanford University, Stanford, California 94304, United States
| | - Sunghoon Kwon
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
- Department
of Interdisciplinary Program for Bioengineering, Seoul National University, Seoul 08826, South Korea
- Bio-MAX
Institute, Seoul National University, Seoul 08826, Republic of Korea
| | - Yeongjae Choi
- School
of Materials Science and Engineering, Gwangju
Institute of Science and Technology (GIST), Gwangju 61105, Republic of Korea
| |
Collapse
|
19
|
Liu DD, Cheow LF. Rapid Information Retrieval from DNA Storage with Microfluidic Very Large-Scale Integration Platform. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023:e2309867. [PMID: 38048539 DOI: 10.1002/smll.202309867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 11/09/2023] [Indexed: 12/06/2023]
Abstract
Due to its high information density, DNA is very attractive as a data storage system. However, a major obstacle is the high cost and long turnaround time for retrieving DNA data with next-generation sequencing. Herein, the use of a microfluidic very large-scale integration (mVLSI) platform is described to perform highly parallel and rapid readout of data stored in DNA. Additionally, it is demonstrated that multi-state data encoded in DNA can be deciphered with on-chip melt-curve analysis, thereby further increasing the data content that can be analyzed. The pairing of mVLSI network architecture with exquisitely specific DNA recognition gives rise to a scalable platform for rapid DNA data reading.
Collapse
Affiliation(s)
- Dong Dong Liu
- Department of Biomedical Engineering and Institute for Health Innovation and Technology, National University of Singapore, Singapore, 119077, Singapore
| | - Lih Feng Cheow
- Department of Biomedical Engineering and Institute for Health Innovation and Technology, National University of Singapore, Singapore, 119077, Singapore
| |
Collapse
|
20
|
Lin W, Chu L, Su Y, Xie R, Yao X, Zan X, Xu P, Liu W. Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method. Comput Biol Med 2023; 166:107548. [PMID: 37801922 DOI: 10.1016/j.compbiomed.2023.107548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/24/2023] [Accepted: 09/28/2023] [Indexed: 10/08/2023]
Abstract
BACKGROUND In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied. METHOD As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences. RESULTS Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.
Collapse
Affiliation(s)
- Wanmin Lin
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ling Chu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Yanqing Su
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ranze Xie
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangyu Yao
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangzhen Zan
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Peng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| | - Wenbin Liu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| |
Collapse
|
21
|
Yu M, Lim D, Kim J, Song Y. Processing DNA Storage through Programmable Assembly in a Droplet-Based Fluidics System. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2303197. [PMID: 37755129 PMCID: PMC10646262 DOI: 10.1002/advs.202303197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/11/2023] [Indexed: 09/28/2023]
Abstract
DNA can be used to store digital data, and synthetic short-sequence DNA pools are developed to store high quantities of digital data. However, synthetic DNA data cannot be actively processed in DNA pools. An active DNA data editing process is developed using splint ligation in a droplet-controlled fluidics (DCF) system. DNA fragments of discrete sizes (100-500 bps) are synthesized for droplet assembly, and programmed sequence information exchange occurred. The encoded DNA sequences are processed in series and parallel to synthesize the determined DNA pools, enabling random access using polymerase chain reaction amplification. The sequencing results of the assembled DNA data pools can be orderly aligned for decoding and have high fidelity through address primer scanning. Furthermore, eight 90 bps DNA pools with pixel information (png: 0.27-0.28 kB), encoded by codons, are synthesized to create eight 270 bps DNA pools with an animation movie chip file (mp4: 12 kB) in the DCF system.
Collapse
Affiliation(s)
- Minsang Yu
- Standard Bioelectronics. Co., 511 Michuhol Tower, Gaetbeol-ro 12, Incheon, 21999, South Korea
| | - Doyeon Lim
- Department of Nano-Bioengineering, Incheon National University, Academy-ro 119, Incheon, 22012, South Korea
| | - Jungwoo Kim
- Department of Nano-Bioengineering, Incheon National University, Academy-ro 119, Incheon, 22012, South Korea
| | - Youngjun Song
- Standard Bioelectronics. Co., 511 Michuhol Tower, Gaetbeol-ro 12, Incheon, 21999, South Korea
- Department of Nano-Bioengineering, Incheon National University, Academy-ro 119, Incheon, 22012, South Korea
| |
Collapse
|
22
|
Yang X, Lai L, Qiang X, Deng M, Xie Y, Shi X, Kou Z. Towards Chinese text and DNA shift encoding scheme based on biomass plasmid storage. FRONTIERS IN BIOINFORMATICS 2023; 3:1276934. [PMID: 37900965 PMCID: PMC10602677 DOI: 10.3389/fbinf.2023.1276934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 09/28/2023] [Indexed: 10/31/2023] Open
Abstract
DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid's resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Yuhao Xie
- School of Mathematical Science, Inner Mongolia University, Hohhot, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Zheng Kou
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
23
|
Volkel KD, Lin KN, Hook PW, Timp W, Keung AJ, Tuck JM. FrameD: framework for DNA-based data storage design, verification, and validation. Bioinformatics 2023; 39:btad572. [PMID: 37713474 PMCID: PMC10563143 DOI: 10.1093/bioinformatics/btad572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/27/2023] [Accepted: 09/13/2023] [Indexed: 09/17/2023] Open
Abstract
MOTIVATION DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components. RESULTS We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems. AVAILABILITY AND IMPLEMENTATION The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).
Collapse
Affiliation(s)
- Kevin D Volkel
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| | - Kevin N Lin
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Albert J Keung
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States
| | - James M Tuck
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| |
Collapse
|
24
|
Rasool A, Hong J, Jiang Q, Chen H, Qu Q. BO-DNA: Biologically optimized encoding model for a highly-reliable DNA data storage. Comput Biol Med 2023; 165:107404. [PMID: 37666064 DOI: 10.1016/j.compbiomed.2023.107404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/13/2023] [Accepted: 08/26/2023] [Indexed: 09/06/2023]
Abstract
DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jingwei Hong
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Qingshan Jiang
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, Guangdong, China
| | - Qiang Qu
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
25
|
Mu Z, Cao B, Wang P, Wang B, Zhang Q. RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. IEEE Trans Nanobioscience 2023; 22:912-922. [PMID: 37028365 DOI: 10.1109/tnb.2023.3254514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage.
Collapse
|
26
|
Wang J, Raito H, Shimada N, Maruyama A. A Cationic Copolymer Enhances Responsiveness and Robustness of DNA Circuits. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2304091. [PMID: 37340578 DOI: 10.1002/smll.202304091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 12/12/2012] [Indexed: 06/22/2023]
Abstract
Toehold-mediated DNA circuits are extensively employed to construct diverse DNA nanodevices and signal amplifiers. However, operations of these circuits are slow and highly susceptive to molecular noise such as the interference from bystander DNA strands. Herein, this work investigates the effects of a series of cationic copolymers on DNA catalytic hairpin assembly, a representative toehold-mediated DNA circuit. One copolymer, poly(L -lysine)-graft-dextran, significantly enhances the reaction rate by 30-fold due to its electrostatic interaction with DNA. Moreover, the copolymer considerably alleviates the circuit's dependency on the length and GC content of toehold, thereby enhancing the robustness of circuit operation against molecular noise. The general effectiveness of poly(L -lysine)-graft-dextran is demonstrated through kinetic characterization of a DNA AND logic circuit. Therefore, use of a cationic copolymer is a versatile and efficient approach to enhance the operation rate and robustness of toehold-mediated DNA circuits, paving the way for more flexible design and broader application.
Collapse
Affiliation(s)
- Jun Wang
- Department of Life Science and Technology, Tokyo Institute of Technology, Nagatsuta-cho 4259 B-57, Midori, Yokohama, 226-8501, Japan
| | - Hayashi Raito
- Department of Life Science and Technology, Tokyo Institute of Technology, Nagatsuta-cho 4259 B-57, Midori, Yokohama, 226-8501, Japan
| | - Naohiko Shimada
- Department of Life Science and Technology, Tokyo Institute of Technology, Nagatsuta-cho 4259 B-57, Midori, Yokohama, 226-8501, Japan
| | - Atsushi Maruyama
- Department of Life Science and Technology, Tokyo Institute of Technology, Nagatsuta-cho 4259 B-57, Midori, Yokohama, 226-8501, Japan
| |
Collapse
|
27
|
Gimpel AL, Stark WJ, Heckel R, Grass RN. A digital twin for DNA data storage based on comprehensive quantification of errors and biases. Nat Commun 2023; 14:6026. [PMID: 37758710 PMCID: PMC10533828 DOI: 10.1038/s41467-023-41729-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.
Collapse
Affiliation(s)
- Andreas L Gimpel
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Reinhard Heckel
- Department of Computer Engineering, Technical University of Munich, Arcistrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.
| |
Collapse
|
28
|
Yan Y, Pinnamaneni N, Chalapati S, Crosbie C, Appuswamy R. Scaling logical density of DNA storage with enzymatically-ligated composite motifs. Sci Rep 2023; 13:15978. [PMID: 37749195 PMCID: PMC10519978 DOI: 10.1038/s41598-023-43172-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 09/20/2023] [Indexed: 09/27/2023] Open
Abstract
DNA is a promising candidate for long-term data storage due to its high density and endurance. The key challenge in DNA storage today is the cost of synthesis. In this work, we propose composite motifs, a framework that uses a mixture of prefabricated motifs as building blocks to reduce synthesis cost by scaling logical density. To write data, we introduce Bridge Oligonucleotide Assembly, an enzymatic ligation technique for synthesizing oligos based on composite motifs. To sequence data, we introduce Direct Oligonucleotide Sequencing, a nanopore-based technique to sequence short oligos, eliminating common preparatory steps like DNA assembly, amplification and end-prep. To decode data, we introduce Motif-Search, a novel consensus caller that provides accurate reconstruction despite synthesis and sequencing errors. Using the proposed methods, we present an end-to-end experiment where we store the text "HelloWorld" at a logical density of 84 bits/cycle (14-42× improvement over state-of-the-art).
Collapse
Affiliation(s)
- Yiqing Yan
- Data Science Department, EURECOM, Biot, France
| | | | | | | | | |
Collapse
|
29
|
Zhao Y, Cao B, Wang P, Wang K, Wang B. DBTRG: De Bruijn Trim rotation graph encoding for reliable DNA storage. Comput Struct Biotechnol J 2023; 21:4469-4477. [PMID: 37736298 PMCID: PMC10510065 DOI: 10.1016/j.csbj.2023.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023] Open
Abstract
DNA is a high-density, long-term stable, and scalable storage medium that can meet the increased demands on storage media resulting from the exponential growth of data. The existing DNA storage encoding schemes tend to achieve high-density storage but do not fully consider the local and global stability of DNA sequences and the read and write accuracy of the stored information. To address these problems, this article presents a graph-based De Bruijn Trim Rotation Graph (DBTRG) encoding scheme. Through XOR between the proposed dynamic binary sequence and the original binary sequence, k-mers can be divided into the De Bruijn Trim graph, and the stored information can be compressed according to the overlapping relationship. The simulated experimental results show that DBTRG ensures base balance and diversity, reduces the likelihood of undesired motifs, and improves the stability of DNA storage and data recovery. Furthermore, the maintenance of an encoding rate of 1.92 while storing 510 KB images and the introduction of novel approaches and concepts for DNA storage encoding methods are achieved.
Collapse
Affiliation(s)
- Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Penghao Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| |
Collapse
|
30
|
Liu Y, Zhang X, Zhang X, Liu X, Wang B, Zhang Q, Wei X. Temporal logic circuits implementation using a dual cross-inhibition mechanism based on DNA strand displacement. RSC Adv 2023; 13:27125-27134. [PMID: 37701285 PMCID: PMC10493850 DOI: 10.1039/d3ra03995a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 08/21/2023] [Indexed: 09/14/2023] Open
Abstract
Molecular circuits crafted from DNA molecules harness the inherent programmability and biocompatibility of DNA to intelligently steer molecular machines in the execution of microscopic tasks. In comparison to combinational circuits, DNA-based temporal circuits boast supplementary capabilities, allowing them to proficiently handle the omnipresent temporal information within biochemical systems and life sciences. However, the lack of temporal mechanisms and components proficient in comprehending and processing temporal information presents challenges in advancing DNA circuits that excel in complex tasks requiring temporal control and time perception. In this study, we engineered temporal logic circuits through the design and implementation of a dual cross-inhibition mechanism, which enables the acceptance and processing of temporal information, serving as a fundamental building block for constructing temporal circuits. By incorporating the dual cross-inhibition mechanism, the temporal logic gates are endowed with cascading capabilities, significantly enhancing the inhibitory effect compared to a cross-inhibitor. Furthermore, we have introduced the annihilation mechanism into the circuit to further augment the inhibition effect. As a result, the circuit demonstrates sensitive time response characteristics, leading to a fundamental improvement in circuit performance. This architecture provides a means to efficiently process temporal signals in DNA strand displacement circuits. We anticipate that our findings will contribute to the design of complex temporal logic circuits and the advancement of molecular programming.
Collapse
Affiliation(s)
- Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Xiaokang Zhang
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Xun Zhang
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Xin Liu
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University Dalian 116622 China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| |
Collapse
|
31
|
Park SJ, Kim S, Jeong J, No A, No JS, Park H. Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads. Bioinformatics 2023; 39:btad548. [PMID: 37669160 PMCID: PMC10500082 DOI: 10.1093/bioinformatics/btad548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.
Collapse
Affiliation(s)
- Seong-Joon Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Sunghwan Kim
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, South Korea
| | - Jaeho Jeong
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Albert No
- Department of Electronic and Electrical Engineering, Hongik University, Seoul 04066, South Korea
| | - Jong-Seon No
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Hosung Park
- Department of Computer Engineering, Chonnam National University, Gwangju 61186, South Korea
- Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 61186, South Korea
| |
Collapse
|
32
|
Raza MH, Desai S, Aravamudhan S, Zadegan R. An outlook on the current challenges and opportunities in DNA data storage. Biotechnol Adv 2023; 66:108155. [PMID: 37068530 PMCID: PMC11060094 DOI: 10.1016/j.biotechadv.2023.108155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/23/2023] [Accepted: 04/12/2023] [Indexed: 04/19/2023]
Abstract
Silicon is the gold standard for information storage systems. The exponential generation of digital information will exhaust the global supply of refined silicon. Therefore, investing in alternative information storage materials such as DNA has gained momentum. DNA as a memory material possesses several advantages over silicon-based data storage, including higher storage capacity, data retention, and lower operational energy. Routine DNA data storage approaches encode data into chemically synthesized nucleotide sequences. The scalability of DNA data storage depends on factors such as the cost and the generation of hazardous waste during DNA synthesis, latency of writing and reading, and limited rewriting capacity. Here, we review the current status of DNA data storage encoding, writing, storing, retrieving and reading, and discuss the technology's challenges and opportunities.
Collapse
Affiliation(s)
- Muhammad Hassan Raza
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA
| | - Salil Desai
- Department of Industrial & Systems Engineering, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Shyam Aravamudhan
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA.
| |
Collapse
|
33
|
Zheng Y, Cao B, Wu J, Wang B, Zhang Q. High Net Information Density DNA Data Storage by the MOPE Encoding Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2992-3000. [PMID: 37015121 DOI: 10.1109/tcbb.2023.3263521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
DNA has recently been recognized as an attractive storage medium due to its high reliability, capacity, and durability. However, encoding algorithms that simply map binary data to DNA sequences have the disadvantages of low net information density and high synthesis cost. Therefore, this paper proposes an efficient, feasible, and highly robust encoding algorithm called MOPE (Modified Barnacles Mating Optimizer and Payload Encoding). The Modified Barnacles Mating Optimizer (MBMO) algorithm is used to construct the non-payload coding set, and the Payload Encoding (PE) algorithm is used to encode the payload. The results show that the lower bound of the non-payload coding set constructed by the MBMO algorithm is 3%-18% higher than the optimal result of previous work, and theoretical analysis shows that the designed PE algorithm has a net information density of 1.90 bits/nt, which is close to the ideal information capacity of 2 bits per nucleotide. The proposed MOPE encoding algorithm with high net information density and satisfying constraints can not only effectively reduce the cost of DNA synthesis and sequencing but also reduce the occurrence of errors during DNA storage.
Collapse
|
34
|
Liu Q, Wei Y, Wang Z, Song DP, Cui J, Qi H. Sustainable DNA Data Storage on Cellulose Paper. SMALL METHODS 2023; 7:e2201610. [PMID: 37263984 DOI: 10.1002/smtd.202201610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 04/04/2023] [Indexed: 06/03/2023]
Abstract
DNA is a promising material for high density and long-term archival data storage. In addition to algorithms for encoding digital information into DNA sequences, the DNA writing (chemical synthesis) and reading (DNA sequencing), the preservation of DNA mixtures with high sequence diversity is another critical issue for sustainable, long-term, and large-scale DNA data storage. Here, this work demonstrates a method for low-cost, convenient and sustainable DNA data storage on cellulose paper. A DNA pool comprising thousands of sequences, in which archival data are encoded, is conveniently stored on a cellulose paper with a calculated density as high as 15 TB per mm3 through electrostatic adsorption. This work demonstrates that these digitally encoded DNA pools can be stable for years on the cellulose paper after drying even when directly exposed to air. Furthermore, the reversible electrostatic adsorption enables repeated loading/retrieval of DNA on/off cellulose paper. Therefore, this sustainable DNA preservation on cellulose paper through the convenient electrostatic adsorption exhibits a great advantage in terms of storage capacity and cost that is crucial for practical systems to achieve large-scale and long-time data storage.
Collapse
Affiliation(s)
- Qian Liu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300350, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300350, China
| | - Yanan Wei
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300350, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300350, China
| | - Zhaoguan Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300350, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300350, China
| | - Dong-Po Song
- Tianjin Key Laboratory of Composite and Functional Materials, School of Materials Science and Engineering, Tianjin University, Tianjin, 300350, China
| | - Jingsong Cui
- School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China
| | - Hao Qi
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300350, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300350, China
- Zhejiang Shaoxing Research Institute of Tianjin University, Zhejiang, 312369, China
| |
Collapse
|
35
|
Abram KZ, Udaondo Z. Leveraging nature to advance data storage: DNA as a storage medium. Microb Biotechnol 2023; 16:1709-1712. [PMID: 37300423 PMCID: PMC10443336 DOI: 10.1111/1751-7915.14291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open
Affiliation(s)
- Kaleb Z. Abram
- Department of Biomedical InformaticsUniversity of Arkansas for Medical SciencesLittle RockArkansasUSA
| | - Zulema Udaondo
- Department of Biomedical InformaticsUniversity of Arkansas for Medical SciencesLittle RockArkansasUSA
| |
Collapse
|
36
|
Wang P, Cao B, Ma T, Wang B, Zhang Q, Zheng P. DUHI: Dynamically updated hash index clustering method for DNA storage. Comput Biol Med 2023; 164:107244. [PMID: 37453377 DOI: 10.1016/j.compbiomed.2023.107244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/08/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
The exponential growth of global data leads to the problem of insufficient data storage capacity. DNA storage can be an ideal storage method due to its high storage density and long storage time. However, the DNA storage process is subject to unavoidable errors that can lead to increased cluster redundancy during data reading, which in turn affects the accuracy of the data reads. This paper proposes a dynamically updated hash index (DUHI) clustering method for DNA storage, which clusters sequences by constructing a dynamic core index set and using hash lookup. The proposed clustering method is analyzed in terms of overall reliability evaluation and visualization evaluation. The results show that the DUHI clustering method can reduce the redundancy of more than 10% of the sequences within the cluster and increase the reconstruction rate of the sequences to more than 99%. Therefore, our method solves the high redundancy problem after DNA sequence clustering, improves the accuracy of data reading, and promotes the development of DNA storage.
Collapse
Affiliation(s)
- Penghao Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, 116024, Dalian, China
| | - Tao Ma
- Brain Function Research Section, The First Hospital of China Medical University, 110001, Shenyang, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China.
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, 116622, Dalian, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, 8140, Christchurch, New Zealand
| |
Collapse
|
37
|
Bögels BWA, Nguyen BH, Ward D, Gascoigne L, Schrijver DP, Makri Pistikou AM, Joesaar A, Yang S, Voets IK, Mulder WJM, Phillips A, Mann S, Seelig G, Strauss K, Chen YJ, de Greef TFA. DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access. NATURE NANOTECHNOLOGY 2023; 18:912-921. [PMID: 37142708 PMCID: PMC10427423 DOI: 10.1038/s41565-023-01377-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 03/19/2023] [Indexed: 05/06/2023]
Abstract
DNA has emerged as an attractive medium for archival data storage due to its durability and high information density. Scalable parallel random access to information is a desirable property of any storage system. For DNA-based storage systems, however, this still needs to be robustly established. Here we report on a thermoconfined polymerase chain reaction, which enables multiplexed, repeated random access to compartmentalized DNA files. The strategy is based on localizing biotin-functionalized oligonucleotides inside thermoresponsive, semipermeable microcapsules. At low temperatures, microcapsules are permeable to enzymes, primers and amplified products, whereas at high temperatures, membrane collapse prevents molecular crosstalk during amplification. Our data show that the platform outperforms non-compartmentalized DNA storage compared with repeated random access and reduces amplification bias tenfold during multiplex polymerase chain reaction. Using fluorescent sorting, we also demonstrate sample pooling and data retrieval by microcapsule barcoding. Therefore, the thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access to archival DNA files.
Collapse
Affiliation(s)
- Bas W A Bögels
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Bichlien H Nguyen
- Microsoft, Redmond, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - David Ward
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Levena Gascoigne
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Laboratory of Self-Organizing Soft Matter, Department of Chemical Engineering and Chemistry, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - David P Schrijver
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Anna-Maria Makri Pistikou
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Alex Joesaar
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Shuo Yang
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Ilja K Voets
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Laboratory of Self-Organizing Soft Matter, Department of Chemical Engineering and Chemistry, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Willem J M Mulder
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | - Stephen Mann
- Centre for Protolife Research and Centre for Organized Matter Chemistry, School of Chemistry, University of Bristol, Bristol, UK
- School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, People's Republic of China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Department of Electrical Engineering, University of Washington, Seattle, WA, USA
| | - Karin Strauss
- Microsoft, Redmond, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Yuan-Jyue Chen
- Microsoft, Redmond, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| | - Tom F A de Greef
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands.
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Molecules and Materials, Radboud University, Nijmegen, The Netherlands.
- Center for Living Technologies, Eindhoven-Wageningen-Utrecht Alliance, Utrecht, The Netherlands.
| |
Collapse
|
38
|
Zhang XE, Liu C, Dai J, Yuan Y, Gao C, Feng Y, Wu B, Wei P, You C, Wang X, Si T. Enabling technology and core theory of synthetic biology. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1742-1785. [PMID: 36753021 PMCID: PMC9907219 DOI: 10.1007/s11427-022-2214-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/04/2022] [Indexed: 02/09/2023]
Abstract
Synthetic biology provides a new paradigm for life science research ("build to learn") and opens the future journey of biotechnology ("build to use"). Here, we discuss advances of various principles and technologies in the mainstream of the enabling technology of synthetic biology, including synthesis and assembly of a genome, DNA storage, gene editing, molecular evolution and de novo design of function proteins, cell and gene circuit engineering, cell-free synthetic biology, artificial intelligence (AI)-aided synthetic biology, as well as biofoundries. We also introduce the concept of quantitative synthetic biology, which is guiding synthetic biology towards increased accuracy and predictability or the real rational design. We conclude that synthetic biology will establish its disciplinary system with the iterative development of enabling technologies and the maturity of the core theory.
Collapse
Affiliation(s)
- Xian-En Zhang
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Chenli Liu
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Junbiao Dai
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Caixia Gao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Bian Wu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Ping Wei
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Chun You
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Tong Si
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
39
|
Zhang X, Liu X, Yao Y, Liu Y, Zeng C, Zhang Q. Programmable Molecular Signal Transmission Architecture and Reactant Regeneration Strategy Driven by EXO λ for DNA Circuits. ACS Synth Biol 2023; 12:2107-2117. [PMID: 37405388 DOI: 10.1021/acssynbio.3c00168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
The characteristics of DNA hybridization enable molecular computing through strand displacement reactions, facilitating the construction of complex DNA circuits, which is an important way to realize information interaction and processing at a molecular level. However, signal attenuation in the cascade and shunt process hinders the reliability of the calculation results and further expansion of the DNA circuit scale. Here, we demonstrate a novel programmable exonuclease-assisted signal transmission architecture, where DNA strand with toehold employed to inhibit the hydrolysis process of EXO λ is applied in DNA circuits. We construct a series circuit with variable resistance and a parallel circuit with constant current source, ensuring excellent orthogonal properties between input and output sequences while maintaining low leakage (<5%) during the reaction. Additionally, a simple and flexible exonuclease-driven reactant regeneration (EDRR) strategy is proposed and applied to construct parallel circuits with constant voltage sources that could amplify the output signal without extra DNA fuel strands or energy. Furthermore, we demonstrate the effectiveness of the EDRR strategy in reducing signal attenuation during cascade and shunt processes by constructing a four-node DNA circuit. These findings offer a new approach to enhance the reliability of molecular computing systems and expand the scale of DNA circuits in the future.
Collapse
Affiliation(s)
- Xun Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Xin Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yao Yao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Chenyi Zeng
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
40
|
Yang X, Shi X, Lai L, Chen C, Xu H, Deng M. Towards long double-stranded chains and robust DNA-based data storage using the random code system. Front Genet 2023; 14:1179867. [PMID: 37384333 PMCID: PMC10294226 DOI: 10.3389/fgene.2023.1179867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 05/31/2023] [Indexed: 06/30/2023] Open
Abstract
DNA has become a popular choice for next-generation storage media due to its high storage density and stability. As the storage medium of life's information, DNA has significant storage capacity and low-cost, low-power replication and transcription capabilities. However, utilizing long double-stranded DNA for storage can introduce unstable factors that make it difficult to meet the constraints of biological systems. To address this challenge, we have designed a highly robust coding scheme called the "random code system," inspired by the idea of fountain codes. The random code system includes the establishment of a random matrix, Gaussian preprocessing, and random equilibrium. Compared to Luby transform codes (LT codes), random code (RC) has better robustness and recovery ability of lost information. In biological experiments, we successfully stored 29,390 bits of data in 25,700 bp chains, achieving a storage density of 1.78 bits per nucleotide. These results demonstrate the potential for using long double-stranded DNA and the random code system for robust DNA-based data storage.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Congzhou Chen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Huaisheng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
41
|
Buko T, Tuczko N, Ishikawa T. DNA Data Storage. BIOTECH 2023; 12:44. [PMID: 37366792 DOI: 10.3390/biotech12020044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/22/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
The demand for data storage is growing at an unprecedented rate, and current methods are not sufficient to accommodate such rapid growth due to their cost, space requirements, and energy consumption. Therefore, there is a need for a new, long-lasting data storage medium with high capacity, high data density, and high durability against extreme conditions. DNA is one of the most promising next-generation data carriers, with a storage density of 10¹⁹ bits of data per cubic centimeter, and its three-dimensional structure makes it about eight orders of magnitude denser than other storage media. DNA amplification during PCR or replication during cell proliferation enables the quick and inexpensive copying of vast amounts of data. In addition, DNA can possibly endure millions of years if stored in optimal conditions and dehydrated, making it useful for data storage. Numerous space experiments on microorganisms have also proven their extraordinary durability in extreme conditions, which suggests that DNA could be a durable storage medium for data. Despite some remaining challenges, such as the need to refine methods for the fast and error-free synthesis of oligonucleotides, DNA is a promising candidate for future data storage.
Collapse
Affiliation(s)
- Tomasz Buko
- Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, Miecznikowa 1, PL-02-096 Warsaw, Poland
| | - Nella Tuczko
- Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, Miecznikowa 1, PL-02-096 Warsaw, Poland
| | - Takao Ishikawa
- Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, Miecznikowa 1, PL-02-096 Warsaw, Poland
| |
Collapse
|
42
|
Jung YJ, Kim H, Cheong HK, Lim YB. Magnetic control of self-assembly and disassembly in organic materials. Nat Commun 2023; 14:3081. [PMID: 37248227 DOI: 10.1038/s41467-023-38846-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/18/2023] [Indexed: 05/31/2023] Open
Abstract
Because organic molecules and materials are generally insensitive or weakly sensitive to magnetic fields, a certain means to enhance their magnetic responsiveness needs to be exploited. Here we show a strategy to amplify the magnetic responsiveness of self-assembled peptide nanostructures by synergistically combining the concepts of perfect α-helix and rod-coil supramolecular building blocks. Firstly, we develop a monomeric, nonpolar, and perfect α-helix (MNP-helix). Then, we employ the MNP-helix as the rod block of rod-coil amphiphiles (rod-coils) because rod-coils are well-suited for fabricating responsive assemblies. We show that the self-assembly processes of the designed rod-coils and disassembly of rod-coil/DNA complexes can be controlled in a magnetically responsive manner using the relatively weak magnetic field provided by the ordinary neodymium magnet [0.07 ~ 0.25 Tesla (T)]. These results demonstrate that magnetically responsive organic assemblies usable under practical conditions can be realized by using rod-coil supramolecular building blocks containing constructively organized diamagnetic moieties.
Collapse
Affiliation(s)
- You-Jin Jung
- Department of Materials Science & Engineering, Yonsei University, 50 Yonsei-ro, Seoul, 03722, Republic of Korea
| | - Hyoseok Kim
- Department of Materials Science & Engineering, Yonsei University, 50 Yonsei-ro, Seoul, 03722, Republic of Korea
| | - Hae-Kap Cheong
- Division of Magnetic Resonance, Korea Basic Science Institute, Ochang, 28119, Republic of Korea
| | - Yong-Beom Lim
- Department of Materials Science & Engineering, Yonsei University, 50 Yonsei-ro, Seoul, 03722, Republic of Korea.
| |
Collapse
|
43
|
Lau B, Chandak S, Roy S, Tatwawadi K, Wootters M, Weissman T, Ji HP. Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing. Sci Rep 2023; 13:8514. [PMID: 37231057 DOI: 10.1038/s41598-023-29575-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 02/07/2023] [Indexed: 05/27/2023] Open
Abstract
The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.
Collapse
Affiliation(s)
- Billy Lau
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Shubham Chandak
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Sharmili Roy
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Kedar Tatwawadi
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Mary Wootters
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA.
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA.
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA.
| |
Collapse
|
44
|
He Z, Shi K, Li J, Chao J. Self-assembly of DNA origami for nanofabrication, biosensing, drug delivery, and computational storage. iScience 2023; 26:106638. [PMID: 37187699 PMCID: PMC10176269 DOI: 10.1016/j.isci.2023.106638] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
Since the pioneering work of immobile DNA Holliday junction by Ned Seeman in the early 1980s, the past few decades have witnessed the development of DNA nanotechnology. In particular, DNA origami has pushed the field of DNA nanotechnology to a new level. It obeys the strict Watson-Crick base pairing principle to create intricate structures with nanoscale accuracy, which greatly enriches the complexity, dimension, and functionality of DNA nanostructures. Benefiting from its high programmability and addressability, DNA origami has emerged as versatile nanomachines for transportation, sensing, and computing. This review will briefly summarize the recent progress of DNA origami, two-dimensional pattern, and three-dimensional assembly based on DNA origami, followed by introduction of its application in nanofabrication, biosensing, drug delivery, and computational storage. The prospects and challenges of assembly and application of DNA origami are also discussed.
Collapse
Affiliation(s)
- Zhimei He
- Key Laboratory for Organic Electronics & Information Displays (KLOEID), Jiangsu Key Laboratory for Biosensors Institute of Advanced Materials (IAM) and School of Materials Science and Engineering, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
- Smart Health Big Data Analysis and Location Services Engineering Research Center of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
| | - Kejun Shi
- Key Laboratory for Organic Electronics & Information Displays (KLOEID), Jiangsu Key Laboratory for Biosensors Institute of Advanced Materials (IAM) and School of Materials Science and Engineering, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
| | - Jinggang Li
- Key Laboratory for Organic Electronics & Information Displays (KLOEID), Jiangsu Key Laboratory for Biosensors Institute of Advanced Materials (IAM) and School of Materials Science and Engineering, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
| | - Jie Chao
- Key Laboratory for Organic Electronics & Information Displays (KLOEID), Jiangsu Key Laboratory for Biosensors Institute of Advanced Materials (IAM) and School of Materials Science and Engineering, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
- Smart Health Big Data Analysis and Location Services Engineering Research Center of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
- Corresponding author
| |
Collapse
|
45
|
Seo SY, Min S, Lee S, Seo JH, Park J, Kim HK, Song M, Baek D, Cho SR, Kim HH. Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s. Nat Methods 2023:10.1038/s41592-023-01875-2. [PMID: 37188955 DOI: 10.1038/s41592-023-01875-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 04/10/2023] [Indexed: 05/17/2023]
Abstract
Recently, various small Cas9 orthologs and variants have been reported for use in in vivo delivery applications. Although small Cas9s are particularly suited for this purpose, selecting the most optimal small Cas9 for use at a specific target sequence continues to be challenging. Here, to this end, we have systematically compared the activities of 17 small Cas9s for thousands of target sequences. For each small Cas9, we have characterized the protospacer adjacent motif and determined optimal single guide RNA expression formats and scaffold sequence. High-throughput comparative analyses revealed distinct high- and low-activity groups of small Cas9s. We also developed DeepSmallCas9, a set of computational models predicting the activities of the small Cas9s at matched and mismatched target sequences. Together, this analysis and these computational models provide a useful guide for researchers to select the most suitable small Cas9 for specific applications.
Collapse
Affiliation(s)
- Sang-Yeon Seo
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
| | | | - Sungtae Lee
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jung Hwa Seo
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jinman Park
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Hui Kwon Kim
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
- Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea
- Graduate Program of Nano Biomedical Engineering (NanoBME), Advanced Science Institute, Yonsei University, Seoul, Republic of Korea
- Department of Integrative Biotechnology, Sungkyunkwan University, Suwon, Republic of Korea
| | - Myungjae Song
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Dawoon Baek
- Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Rehabilitation Medicine, Yonsei University Wonju College of Medicine, Wonju, Republic of Korea
| | - Sung-Rae Cho
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
- Graduate Program of Biomedical Engineering, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Hyongbum Henry Kim
- Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea.
- Graduate Program of Nano Biomedical Engineering (NanoBME), Advanced Science Institute, Yonsei University, Seoul, Republic of Korea.
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
46
|
Xu C, Ma B, Dong X, Lei L, Hao Q, Zhao C, Liu H. Assembly of Reusable DNA Blocks for Data Storage Using the Principle of Movable Type Printing. ACS APPLIED MATERIALS & INTERFACES 2023; 15:24097-24108. [PMID: 37184884 DOI: 10.1021/acsami.3c01860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Due to its high coding density and longevity, DNA is a compelling data storage alternative. However, current DNA data storage systems rely on the de novo synthesis of enormous DNA molecules, resulting in low data editability, high synthesis costs, and restrictions on further applications. Here, we demonstrate the programmable assembly of reusable DNA blocks for versatile data storage using the ancient movable type printing principle. Digital data are first encoded into nucleotide sequences in DNA hairpins, which are then synthesized and immobilized on solid beads as modular DNA blocks. Using DNA polymerase-catalyzed primer exchange reaction, data can be continuously replicated from hairpins on DNA blocks and attached to a primer in tandem to produce new information. The assembly of DNA blocks is highly programmable, producing various data by reusing a finite number of DNA blocks and reducing synthesis costs (∼1718 versus 3000 to 30,000 US$ per megabyte using conventional methods). We demonstrate the flexible assembly of texts, images, and random numbers using DNA blocks and the integration with DNA logic circuits to manipulate data synthesis. This work suggests a flexible paradigm by recombining already synthesized DNA to build cost-effective and intelligent DNA data storage systems.
Collapse
Affiliation(s)
- Chengtao Xu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Biao Ma
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Xing Dong
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Lanjie Lei
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Qing Hao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Chao Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Hong Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| |
Collapse
|
47
|
Fei Z, Gupta N, Li M, Xiao P, Hu X. Toward highly effective loading of DNA in hydrogels for high-density and long-term information storage. SCIENCE ADVANCES 2023; 9:eadg9933. [PMID: 37163589 PMCID: PMC10171811 DOI: 10.1126/sciadv.adg9933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Digital information, when converted into a DNA sequence, provides dense, stable, energy-efficient, and sustainable data storage. The most stable method for encapsulating DNA has been in an inorganic matrix of silica, iron oxide, or both, but are limited by low DNA uptake and complex recovery techniques. This study investigated a rationally designed thermally responsive functionally graded (TRFG) hydrogel as a simple and cost-effective method for storing DNA. The TRFG hydrogel shows high DNA uptake, long-term protection, and reusability due to nondestructive DNA extraction. The high loading capacity was achieved by directly absorbing DNA from the solution, which is then retained because of its interaction with a hyperbranched cationic polymer loaded into a negatively charged hydrogel matrix used as a support and because of its thermoresponsive nature, which allows DNA concentration within the hydrogel through multiple swelling/deswelling cycles. We were able to achieve a high DNA data density of 7.0 × 109 gigabytes per gram using a hydrogel-based system.
Collapse
Affiliation(s)
- Zhongjie Fei
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
- School of Material Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Nupur Gupta
- School of Material Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
- Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore 639798, Singapore
- Environmental Chemistry and Materials Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, Singapore 637141, Singapore
| | - Mengjie Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Pengfeng Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Xiao Hu
- School of Material Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
- Environmental Chemistry and Materials Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, Singapore 637141, Singapore
| |
Collapse
|
48
|
El-Shaikh A, Seeger B. Content-based filter queries on DNA data storage systems. Sci Rep 2023; 13:7053. [PMID: 37120614 PMCID: PMC10148835 DOI: 10.1038/s41598-023-34160-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 04/25/2023] [Indexed: 05/01/2023] Open
Abstract
Recent developments in DNA data storage systems have revealed the great potential to store large amounts of data at a very high density with extremely long persistence and low cost. However, despite recent contributions to robust data encoding, current DNA storage systems offer limited support for random access on DNA storage devices due to restrictive biochemical constraints. Moreover, state-of-the-art approaches do not support content-based filter queries on DNA storage. This paper introduces the first encoding for DNA that enables content-based searches on structured data like relational database tables. We provide the details of the methods for coding and decoding millions of directly accessible data objects on DNA. We evaluate the derived codes on real data sets and verify their robustness.
Collapse
Affiliation(s)
- Alex El-Shaikh
- Departement of Mathematics and Computer Science, University of Marburg, 35037, Marburg, Germany.
| | - Bernhard Seeger
- Departement of Mathematics and Computer Science, University of Marburg, 35037, Marburg, Germany
| |
Collapse
|
49
|
Mortuza GM, Guerrero J, Llewellyn S, Tobiason MD, Dickinson GD, Hughes WL, Zadegan R, Andersen T. In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA). BMC Bioinformatics 2023; 24:160. [PMID: 37085766 PMCID: PMC10120115 DOI: 10.1186/s12859-023-05264-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/30/2023] [Indexed: 04/23/2023] Open
Abstract
Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments.
Collapse
Affiliation(s)
- Golam Md Mortuza
- Department of Computer Science, Boise State University, Boise, Idaho, USA
| | - Jorge Guerrero
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC, USA
| | | | | | | | - William L Hughes
- School of Engineering, Kelowna, University of British Columbia, Kelowna, British Columbia, Canada
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC, USA.
| | - Tim Andersen
- Department of Computer Science, Boise State University, Boise, Idaho, USA.
| |
Collapse
|
50
|
Talbot H, Halvorsen K, Chandrasekaran AR. Encoding, Decoding, and Rendering Information in DNA Nanoswitch Libraries. ACS Synth Biol 2023; 12:978-983. [PMID: 36541933 PMCID: PMC10121895 DOI: 10.1021/acssynbio.2c00649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
DNA-based construction allows the creation of molecular devices that are useful in information storage and processing. Here, we combine the programmability of DNA nanoswitches and stimuli-responsive conformational changes to demonstrate information encoding and graphical readout using gel electrophoresis. We encoded information as 5-bit binary codes for alphanumeric characters using a combination of DNA and RNA inputs that can be decoded using molecular stimuli such as a ribonuclease. We also show that a similar strategy can be used for graphical visual readout of alphabets on an agarose gel, information that is encoded by nucleic acids and decoded by a ribonuclease. Our method of information encoding and processing could be combined with DNA actuation for molecular computation and diagnostics that require a nonarbitrary visual readout.
Collapse
Affiliation(s)
- Hannah Talbot
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| | - Ken Halvorsen
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| | - Arun Richard Chandrasekaran
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| |
Collapse
|