1
|
Shen P, Zheng Y, Zhang C, Li S, Chen Y, Chen Y, Liu Y, Cai Z. DNA storage: The future direction for medical cold data storage. Synth Syst Biotechnol 2025; 10:677-695. [PMID: 40235856 PMCID: PMC11999466 DOI: 10.1016/j.synbio.2025.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 04/17/2025] Open
Abstract
DNA storage, characterized by its durability, data density, and cost-effectiveness, is a promising solution for managing the increasing data volumes in healthcare. This review explores state-of-the-art DNA storage technologies, and provides insights into designing a DNA storage system tailored for medical cold data. We anticipate that a practical approach for medical cold data storage will involve establishing regional, in vitro DNA storage centers that can serve multiple hospitals. The immediacy of DNA storage for medical data hinges on the development of novel, high-density, specialized coding methods. Established commercial techniques, such as DNA chemical synthesis and next-generation sequencing (NGS), along with mixed drying with alkaline salts and refined Polymerase Chain Reaction (PCR), potentially represent the optimal options for data writing, reading, storage, and accessing, respectively. Data security could be promised by the integration of traditional digital encryption and DNA steganography. Although breakthrough developments like artificial nucleotides and DNA nanostructures show potential, they remain in the laboratory research phase. In conclusion, DNA storage is a viable preservation strategy for medical cold data in the near future.
Collapse
Affiliation(s)
- Peilin Shen
- Department of Urology, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yukui Zheng
- The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - CongYu Zhang
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Shuo Li
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, PR China
- BGI-Shenzhen, Shenzhen, Guangdong Province, PR China
- BGI Hospital Groups, Ltd., Shenzhen, Guangdong Province, PR China
| | - Yongru Chen
- Department of Emergency Intensive Care Unit, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yongsong Chen
- Department of Endocrinology, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yuchen Liu
- Shenzhen Institute of Translational Medicine, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Health Science Center, Shenzhen University, Shenzhen, Guangdong Province, PR China
- Key Laboratory of Medical Reprogramming Technology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Guangdong Province, PR China
| | - Zhiming Cai
- Shantou University Medical College, Shantou, Guangdong Province, PR China
- Key Laboratory of Medical Reprogramming Technology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Guangdong Key Laboratory of Systems Biology and Synthetic Biology for Urogenital Tumors, Shenzhen, Guangdong Province, PR China
- State Engineering Laboratory of Medical Key Technologies Application of Synthetic Biology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Carson International Cancer Center of Shenzhen University, Shenzhen, Guangdong Province, PR China
| |
Collapse
|
2
|
Qu G, Yan Z, Chen X, Wu H. DNA data storage for biomedical images using HELIX. NATURE COMPUTATIONAL SCIENCE 2025:10.1038/s43588-025-00793-x. [PMID: 40360759 DOI: 10.1038/s43588-025-00793-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 03/18/2025] [Indexed: 05/15/2025]
Abstract
Deoxyribonucleic acid (DNA) data storage is expected to become a key medium for large-scale data. Biomedical data images typically require substantial storage space over extended periods, making them ideal candidates for DNA data storage. However, existing DNA data storage models are primarily designed for generic files and lack a comprehensive retrieval system for biomedical images. Here, to address this, we propose HELIX, a DNA-based storage system for biomedical images. HELIX introduces an image-compression algorithm tailored to the characteristics of biomedical images, achieving high compression rates and robust error tolerance. In addition, HELIX incorporates an error-correcting encoding algorithm that eliminates the need for indexing, enhancing storage density and decoding speed. We utilize a deep learning-based image repair algorithm for the predictive restoration of partially missing image blocks. In our in vitro experiments, we successfully stored two spatiotemporal genomics images. This sequencing process achieved 97.20% image quality at a depth of 7× coverage.
Collapse
Affiliation(s)
- Guanjin Qu
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China
| | - Zihui Yan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, P. R. China
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China
| | - Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China
| | - Huaming Wu
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China.
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China.
| |
Collapse
|
3
|
Bao M, Herdendorf B, Mendonsa G, Chari S, Reddy A. Low-cost and automated magnetic bead-based DNA data writing via digital microfluidics. LAB ON A CHIP 2025; 25:2030-2042. [PMID: 40070261 DOI: 10.1039/d5lc00106d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
The rapid growth in data generation presents a significant challenge for conventional storage technologies. DNA storage has emerged as a promising solution, offering substantially greater storage density and durability. However, the current DNA data writing process is costly and labor-intensive, hindering the commercialization of DNA data storage. In this study, we present a digital microfluidics (DMF) platform integrated with E47 DNAzyme ligation chemistry to develop a programmable, cost-effective, and automated DNA data writing process. Our method utilizes pre-synthesized single-stranded DNA as building blocks, which can be assembled into diverse DNA sequences that encode desired data. By employing DNAzymes as biocatalysts, we enable an enzyme-free ligation process at room temperature, significantly reducing costs compared to traditional enzyme-based methods. Our proof-of-concept demonstrates an automated DNA writing process with the reduced reagent input, providing an alternative solution to the high costs associated with current DNA data storage methods. The high specificity of ligation using DNAzymes obviates the need for storing each unique DNA block in its own reservoir, which greatly reduces the total number of reservoirs required to store the starting material. This simplifies the overall layout, and the associated plumbing of the DMF platform. To adapt the conventional column-purification required ligation on the DMF platform, we introduce a DNAzyme-cleavage-assisted bead purification assay. This method employs 17E DNAzymes to cleave and release biotinylated DNA from streptavidin beads, followed by a one-pot ligation with E47 DNAzymes to assemble the desired DNA strands. Our study represents a significant advancement in DNA data storage technology, offering a cost-effective and automated solution that enhances scalability and practicality for commercial DNA data storage applications.
Collapse
Affiliation(s)
- Mengdi Bao
- Seagate Technology LLC, 1280 Disc Dr, Shakopee, MN 55379, USA.
| | | | - Gemma Mendonsa
- Seagate Technology LLC, 1280 Disc Dr, Shakopee, MN 55379, USA.
| | - Sriram Chari
- Seagate Technology LLC, 1280 Disc Dr, Shakopee, MN 55379, USA.
| | - Anil Reddy
- Seagate Technology LLC, 1280 Disc Dr, Shakopee, MN 55379, USA.
| |
Collapse
|
4
|
Wang C, Wei D, Wei Z, Yang D, Xing J, Wang Y, Wang X, Wang P, Ma G, Zhang X, Li H, Tang C, Hou P, Wang J, Gao R, Xie G, Li C, Ju Y, Wang P, Yue L, Zhao Y, Sheng Y, Xiao J, Niu H, Xu S, Yang H, Liu D, Duan B, Bu D, Tan G, Chen F. Cost-Effective DNA Storage System with DNA Movable Type. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2411354. [PMID: 39555674 PMCID: PMC11884572 DOI: 10.1002/advs.202411354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 11/06/2024] [Indexed: 11/19/2024]
Abstract
In the face of exponential data growth, DNA-based storage offers a promising solution for preserving big data. However, most existing DNA storage methods, akin to traditional block printing, require costly chemical synthesis for each individual data file, adopting a sequential, one-time-use synthesis approach. To overcome these limitations, a novel, cost-effective "DNA-movable-type storage" system, inspired by movable type printing, is introduced. This system utilizes prefabricated DNA movable types-short, double-stranded DNA oligonucleotides encoding specific payload, address, and checksum data. These DNA-MTs are enzymatically ligated/assembled into cohesive sequences, termed "DNA movable type blocks," streamlining the assembly process with the automated BISHENG-1 DNA-MT inkjet printer. Using BISHENG-1, 43.7 KB of data files are successfully printed, assembled, stored, and accurately retrieved in diverse formats (text, image, audio, and video) in vitro and in vivo, using only 350 DNA-MTs. Notably, each DNA-MT, synthesized once (2 OD), can be used up to 10000 times, reducing costs to $122/MB-outperforming existing DNA storage methods. This innovation circumvents the need to synthesize entire DNA sequences encoding files from scratch, offering significant cost and efficiency advantages. Furthermore, it has considerable untapped potential to advance a robust DNA storage system, better meeting the extensive data storage demands of the big-data era.
Collapse
|
5
|
Wu R, Zhang Y, Teng J, Zhang Q, Zhang C. Molecular Circuit-Controlled Nanoparticle Folders for Programmable DNA Information Access. ACS NANO 2025; 19:6918-6928. [PMID: 39945285 DOI: 10.1021/acsnano.4c13882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
DNA storage has become an attractive alternative to long-term, stable digital data storage because of its high storage density and strong stability. Recently, numerous efforts have been made to develop DNA data access methods to improve the efficiency and accuracy of molecular data reading. However, most current data access methods were achieved by well-developed polymerase chain reaction (PCR) and DNA hybridization, which lack the exploration of dynamic and programmable operations for data access. Here, we propose a programmable DNA data access strategy in which the nanoparticle folders are controlled by DNAzyme circuits to achieve specific information manipulation. We experimentally demonstrate three kinds of circuit programs that access specific information in YES, AND, and OR logic manner. In addition, the selective information access was performed by using a DNAzyme circuit to obtain the target information from a DNA data pool. Importantly, we have extended the circuit-controlled framework to multiple manipulation modes, demonstrating four manipulations on two AuNP folders to access different information on demand. The programmable access strategy provides a paradigm for integrated DNA computing and storage systems and has more applications in the fields of molecular computation and DNA data storage.
Collapse
Affiliation(s)
- Ranfeng Wu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yongpeng Zhang
- School of Control and Computer Engineering, North China Electric Power University, Beijing 100096, China
| | - Jiongjiong Teng
- School of Control and Computer Engineering, North China Electric Power University, Beijing 100096, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Cheng Zhang
- School of Computer Science, Key Laboratory of High Confidence Software Technologies, Peking University, Beijing 100871, China
| |
Collapse
|
6
|
Liu B, Wang F, Fan C, Li Q. Data Readout Techniques for DNA-Based Information Storage. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2025:e2412926. [PMID: 39910849 DOI: 10.1002/adma.202412926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/02/2025] [Indexed: 02/07/2025]
Abstract
DNA is a natural chemical substrate that carries genetic information, which also serves as a powerful toolkit for storing digital data. Compared to traditional storage media, DNA molecules offer higher storage density, longer lifespan, and lower maintenance energy consumption. In DNA storage process, data readout is a critical step that bridges the gap between DNA molecular/structures with stored digital information. With the continued development of strategies in DNA data storage technology, the readout techniques have evolved. However, there is a lack of systematic introduction and discussion on the readout techniques for reported DNA data storage systems, especially the correlation between the design of the data storage system and the corresponding selection of readout techniques. This review first introduces two main categories of DNA data storage units (i.e., sequence and structure) and their corresponding readout techniques (i.e., sequencing and nonsequencing methods), and then reviewed representative examples of notable advancements in DNA data storage technology, focusing on data storage unit design, and readout technique selection. It also introduces emerging approaches to assist data readout techniques, such as implementation of microfluidic and fluorescent probes. Finally, the paper discusses the limitations, challenges, and potential of DNA data readout approaches.
Collapse
Affiliation(s)
- Bingyi Liu
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qian Li
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
7
|
Yan Z, Zhang H, Lu B, Han T, Tong X, Yuan Y. DNA palette code for time-series archival data storage. Natl Sci Rev 2025; 12:nwae321. [PMID: 39758123 PMCID: PMC11697981 DOI: 10.1093/nsr/nwae321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/21/2024] [Accepted: 08/28/2024] [Indexed: 01/07/2025] Open
Abstract
The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called the DNA palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed, but require reliable long-term storage for retrospective research. The DNA palette code employs unordered combinations of index-free oligonucleotides to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vitro testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets and meteorological datasets, demonstrates the advantages of our coding scheme, including high net information density, low decoding sequence coverage and wide applicability.
Collapse
Affiliation(s)
- Zihui Yan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Haoran Zhang
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Boyuan Lu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Tong Han
- Department of Neurosurgery, Huanhu Hospital, Tianjin 300350, China
| | - Xiaoguang Tong
- Department of Neurosurgery, Huanhu Hospital, Tianjin 300350, China
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| |
Collapse
|
8
|
Volkel KD, Hook PW, Keung A, Timp W, Tuck JM. Nanopore decoding with speed and versatility for data storage. Bioinformatics 2024; 41:btaf006. [PMID: 39777456 PMCID: PMC11755093 DOI: 10.1093/bioinformatics/btaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 12/18/2024] [Accepted: 01/07/2025] [Indexed: 01/11/2025] Open
Abstract
MOTIVATION As nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore's capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs. RESULTS We demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 s/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257× with the trade-off of a higher byte error rate of 3.52% compared to the state of the art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4× larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error-free reads when compared to DNA. AVAILABILITY AND IMPLEMENTATION Source code for our soft decoder and data used to generate figures is available publicly in the Github repository https://github.com/dna-storage/hedges-soft-decoder (10.5281/zenodo.11454877). All raw FAST5/FASTQ data are available at 10.5281/zenodo.11985454 and 10.5281/zenodo.12014515.
Collapse
Affiliation(s)
- Kevin D Volkel
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Albert Keung
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - James M Tuck
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| |
Collapse
|
9
|
Şatır E. A DNA Data Storage Method Using Spatial Encoding Based Lossless Compression. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1116. [PMID: 39766746 PMCID: PMC11675758 DOI: 10.3390/e26121116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 12/11/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025]
Abstract
With the rapid increase in global data and rapid development of information technology, DNA sequences have been collected and manipulated on computers. This has yielded a new and attractive field of bioinformatics, DNA storage, where DNA has been considered as a great potential storage medium. It is known that one gram of DNA can store 215 GB of data, and the data stored in the DNA can be preserved for tens of thousands of years. In this study, a lossless and reversible DNA data storage method was proposed. The proposed approach employs a vector representation of each DNA base in a two-dimensional (2D) spatial domain for both encoding and decoding. The structure of the proposed method is reversible, rendering the decompression procedure possible. Experiments were performed to investigate the capacity, compression ratio, stability, and reliability. The obtained results show that the proposed method is much more efficient in terms of capacity than other known algorithms in the literature.
Collapse
Affiliation(s)
- Esra Şatır
- Computer Engineering Department, Düzce University, 81620 Düzce, Turkey
| |
Collapse
|
10
|
Li K, Chen H, Li D, Yang C, Zhang H, Zhu Z. Empowering DNA-Based Information Processing: Computation and Data Storage. ACS APPLIED MATERIALS & INTERFACES 2024; 16:68749-68771. [PMID: 39648356 DOI: 10.1021/acsami.4c13948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2024]
Abstract
Information processing is a critical topic in the digital age, as silicon-based circuits face unprecedented challenges such as data explosion, immense energy consumption, and approaching physical limits. Deoxyribonucleic acid (DNA), naturally selected as a carrier for storing and using genetic information, possesses unique advantages for information processing, which has given rise to the emerging fields of DNA computing and DNA data storage. To meet the growing practical demands, a wide variety of materials and interfaces have been introduced into DNA information processing technologies, leading to significant advancements. This review summarizes the advances in materials and interfaces that facilitate DNA computation and DNA data storage. We begin with a brief overview of the fundamental functions and principles of DNA computation and DNA data storage. Subsequently, we delve into DNA computing systems based on various materials and interfaces, including microbeads, nanomaterials, DNA nanostructures, hydrophilic-hydrophobic compartmentalization, hydrogels, metal-organic frameworks, and microfluidics. We also explore DNA data storage systems, encompassing encapsulation materials, microfluidics techniques, DNA nanostructures, and living cells. Finally, we discuss the current bottlenecks and obstacles in the fields and provide insights into potential future developments.
Collapse
Affiliation(s)
- Kunjie Li
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| | - Heng Chen
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| | - Dayang Li
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| | - Chaoyong Yang
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| | - Huimin Zhang
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| | - Zhi Zhu
- Key Laboratory of Spectrochemical Analysis and Instrumentation, Ministry of Education, State Key Laboratory of Physical Chemistry of Solid Surfaces, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Electronic Engineering, School of Electronic Science and Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
11
|
Bi K, Xu Q, Lai X, Zhao X, Lu Z. Multi-file dynamic compression method based on classification algorithm in DNA storage. Med Biol Eng Comput 2024; 62:3623-3635. [PMID: 38922373 DOI: 10.1007/s11517-024-03156-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 06/17/2024] [Indexed: 06/27/2024]
Abstract
The exponential growth in data volume has necessitated the adoption of alternative storage solutions, and DNA storage stands out as the most promising solution. However, the exorbitant costs associated with synthesis and sequencing impeded its development. Pre-compressing the data is recognized as one of the most effective approaches for reducing storage costs. However, different compression methods yield varying compression ratios for the same file, and compressing a large number of files with a single method may not achieve the maximum compression ratio. This study proposes a multi-file dynamic compression method based on machine learning classification algorithms that selects the appropriate compression method for each file to minimize the amount of data stored into DNA as much as possible. Firstly, four different compression methods are applied to the collected files. Subsequently, the optimal compression method is selected as a label, as well as the file type and size are used as features, which are put into seven machine learning classification algorithms for training. The results demonstrate that k-nearest neighbor outperforms other machine learning algorithms on the validation set and test set most of the time, achieving an accuracy rate of over 85% and showing less volatility. Additionally, the compression rate of 30.85% can be achieved according to k-nearest neighbor model, more than 4.5% compared to the traditional single compression method, resulting in significant cost savings for DNA storage in the range of $0.48 to 3 billion/TB. In comparison to the traditional compression method, the multi-file dynamic compression method demonstrates a more significant compression effect when compressing multiple files. Therefore, it can considerably decrease the cost of DNA storage and facilitate the widespread implementation of DNA storage technology.
Collapse
Affiliation(s)
- Kun Bi
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China.
| | - Qi Xu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| | - Xin Lai
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Xiangwei Zhao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Zuhong Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| |
Collapse
|
12
|
Chu L, Su Y, Zan X, Lin W, Yao X, Xu P, Liu W. A Deniable Encryption Method for Modulation-Based DNA Storage. Interdiscip Sci 2024; 16:872-881. [PMID: 39155324 DOI: 10.1007/s12539-024-00648-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in synthesis and sequencing techniques have made deoxyribonucleic acid (DNA) a promising alternative for next-generation digital storage. As it approaches practical application, ensuring the security of DNA-stored information has become a critical problem. Deniable encryption allows the decryption of different information from the same ciphertext, ensuring that the "plausible" fake information can be provided when users are coerced to reveal the real information. In this paper, we propose a deniable encryption method that uniquely leverages DNA noise channels. Specifically, true and fake messages are encrypted by two similar modulation carriers and subsequently obfuscated by inherent errors. Experiment results demonstrate that our method not only can conceal true information among fake ones indistinguishably, but also allow both the coercive adversary and the legitimate receiver to decrypt the intended information accurately. Further security analysis validates the resistance of our method against various typical attacks. Compared with conventional DNA cryptography methods based on complex biological operations, our method offers superior practicality and reliability, positioning it as an ideal solution for data encryption in future large-scale DNA storage applications.
Collapse
Affiliation(s)
- Ling Chu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Yanqing Su
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiangzhen Zan
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Wanmin Lin
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiangyu Yao
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Peng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, 558000, China.
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, 510000, China.
| | - Wenbin Liu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, 510000, China.
| |
Collapse
|
13
|
Bar-Lev D, Sabary O, Yaakobi E. The zettabyte era is in our DNA. NATURE COMPUTATIONAL SCIENCE 2024; 4:813-817. [PMID: 39516373 DOI: 10.1038/s43588-024-00717-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/03/2024] [Indexed: 11/16/2024]
Abstract
This Perspective surveys the critical computational challenges associated with in vitro DNA-based data storage. As digital data expand exponentially, traditional storage media are becoming less viable, making DNA a promising solution due to its density and durability. However, numerous obstacles remain, including error correction, data retrieval from large volumes of noisy reads, and scalability. The Perspective also highlights challenges for DNA-based data centers, such as fault tolerance, random access, and data removal, which must be addressed to make DNA-based storage practical.
Collapse
Affiliation(s)
- Daniella Bar-Lev
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| | - Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
14
|
Rasool A, Hong J, Hong Z, Li Y, Zou C, Chen H, Qu Q, Wang Y, Jiang Q, Huang X, Dai J. An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. SMALL METHODS 2024; 8:e2301585. [PMID: 38807543 DOI: 10.1002/smtd.202301585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/29/2024] [Indexed: 05/30/2024]
Abstract
DNA-based data storage is a new technology in computational and synthetic biology, that offers a solution for long-term, high-density data archiving. Given the critical importance of medical data in advancing human health, there is a growing interest in developing an effective medical data storage system based on DNA. Data integrity, accuracy, reliability, and efficient retrieval are all significant concerns. Therefore, this study proposes an Effective DNA Storage (EDS) approach for archiving medical MRI data. The EDS approach incorporates three key components (i) a novel fraction strategy to address the critical issue of rotating encoding, which often leads to data loss due to single base error propagation; (ii) a novel rule-based quaternary transcoding method that satisfies bio-constraints and ensure reliable mapping; and (iii) an indexing technique designed to simplify random search and access. The effectiveness of this approach is validated through computer simulations and biological experiments, confirming its practicality. The EDS approach outperforms existing methods, providing superior control over bio-constraints and reducing computational time. The results and code provided in this study open new avenues for practical DNA storage of medical MRI data, offering promising prospects for the future of medical data archiving and retrieval.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jingwei Hong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Zhiling Hong
- Quanzhou Development Group Co., Ltd, Quanzhou, 362000, China
| | - Yuanzhen Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Chao Zou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518055, China
| |
Collapse
|
15
|
Jo S, Shin H, Joe SY, Baek D, Park C, Chun H. Recent progress in DNA data storage based on high-throughput DNA synthesis. Biomed Eng Lett 2024; 14:993-1009. [PMID: 39220021 PMCID: PMC11362454 DOI: 10.1007/s13534-024-00386-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/24/2024] [Accepted: 04/26/2024] [Indexed: 09/04/2024] Open
Abstract
DNA data storage has emerged as a solution for storing massive volumes of data by utilizing nucleic acids as a digital information medium. DNA offers exceptionally high storage density, long durability, and low maintenance costs compared to conventional storage media such as flash memory and hard disk drives. DNA data storage consists of the following steps: encoding, DNA synthesis (i.e., writing), preservation, retrieval, DNA sequencing (i.e., reading), and decoding. Out of these steps, DNA synthesis presents a bottleneck due to imperfect coupling efficiency, low throughput, and excessive use of organic solvents. Overcoming these challenges is essential to establish DNA as a viable data storage medium. In this review, we provide the overall process of DNA data storage, presenting the recent progress of each step. Next, we examine a detailed overview of DNA synthesis methods with an emphasis on their limitations. Lastly, we discuss the efforts to overcome the constraints of each method and their prospects.
Collapse
Affiliation(s)
- Seokwoo Jo
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Haewon Shin
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Sung-yune Joe
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - David Baek
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Chaewon Park
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Honggu Chun
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| |
Collapse
|
16
|
Gao Y, Chen G, Ma B, Wang Y, Wei Y, Qian Y, Kong Z, Hu Y, Ding X, Ping Z, Zhao C, Liu H. Phase transition-driven encapsulation of biomolecules using liquid metal with on-demand release for biomedical applications. Biosens Bioelectron 2024; 259:116403. [PMID: 38776802 DOI: 10.1016/j.bios.2024.116403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 05/10/2024] [Accepted: 05/16/2024] [Indexed: 05/25/2024]
Abstract
Robust encapsulation and controllable release of biomolecules have wide biomedical applications ranging from biosensing, drug delivery to information storage. However, conventional biomolecule encapsulation strategies have limitations in complicated operations, optical instability, and difficulty in decapsulation. Here, we report a simple, robust, and solvent-free biomolecule encapsulation strategy based on gallium liquid metal featuring low-temperature phase transition, self-healing, high hermetic sealing, and intrinsic resistance to optical damage. We sandwiched the biomolecules with the solid gallium films followed by low-temperature welding of the films for direct sealing. The gallium can not only protect DNA and enzymes from various physical and chemical damages but also allow the on-demand release of biomolecules by applying vibration to break the liquid gallium. We demonstrated that a DNA-coded image file can be recovered with up to 99.9% sequence retention after an accelerated aging test. We also showed the practical applications of the controllable release of bioreagents in a one-pot RPA-CRISPR/Cas12a reaction for SARS-COV-2 screening with a low detection limit of 10 copies within 40 min. This work may facilitate the development of robust and stimuli-responsive biomolecule capsules by using low-melting metals for biotechnology.
Collapse
Affiliation(s)
- Yakun Gao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Gangsheng Chen
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Biao Ma
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
| | - Yaru Wang
- Key Laboratory of Environmental Medicine and Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210096, China
| | - Yanjie Wei
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China; BGI Research, Changzhou, 213299, China
| | - Yunzhi Qian
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Ziyan Kong
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yian Hu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xiong Ding
- Key Laboratory of Environmental Medicine and Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210096, China
| | - Zhi Ping
- BGI Research, Changzhou, 213299, China
| | - Chao Zhao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Hong Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
| |
Collapse
|
17
|
Kekić T, Milisavljević N, Troussier J, Tahir A, Debart F, Lietard J. Accelerated, high-quality photolithographic synthesis of RNA microarrays in situ. SCIENCE ADVANCES 2024; 10:eado6762. [PMID: 39083603 PMCID: PMC11290486 DOI: 10.1126/sciadv.ado6762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/26/2024] [Indexed: 08/02/2024]
Abstract
Nucleic acid photolithography is the only microarray fabrication process that has demonstrated chemical versatility accommodating any type of nucleic acid. The current approach to RNA microarray synthesis requires long coupling and photolysis times and suffers from unavoidable degradation postsynthesis. In this study, we developed a series of RNA phosphoramidites with improved chemical and photochemical protection of the 2'- and 5'-OH functions. In so doing, we reduced the coupling time by more than half and the photolysis time by a factor of 4. Sequence libraries that would otherwise take over 6 hours to synthesize can now be prepared in half the time. Degradation is substantially lowered, and concomitantly, hybridization signals can reach over seven times those of the previous state of the art. Under those conditions, high-density RNA microarrays and RNA libraries can now be synthesized at greatly accelerated rates. We also synthesized fluorogenic RNA Mango aptamers on microarrays and investigated the effect of sequence mutations on their fluorogenic properties.
Collapse
Affiliation(s)
- Tadija Kekić
- Institute of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | | | - Joris Troussier
- IBMM, University of Montpellier, CNRS, ENSCM, Montpellier, France
| | - Amina Tahir
- IBMM, University of Montpellier, CNRS, ENSCM, Montpellier, France
| | - Françoise Debart
- IBMM, University of Montpellier, CNRS, ENSCM, Montpellier, France
| | - Jory Lietard
- Institute of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
18
|
Xu Y, Ding L, Wu S, Ruan J. Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2402951. [PMID: 38874370 PMCID: PMC11321706 DOI: 10.1002/advs.202402951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/10/2024] [Indexed: 06/15/2024]
Abstract
Composite DNA letters, by merging all four DNA nucleotides in specified ratios, offer a pathway to substantially increase the logical density of DNA digital storage (DDS) systems. However, these letters are susceptible to nucleotide errors and sampling bias, leading to a high letter error rate, which complicates precise data retrieval and augments reading expenses. To address this, Derrick-cp is introduced as an innovative soft-decision decoding algorithm tailored for DDS utilizing composite letters. Derrick-cp capitalizes on the distinctive error sensitivities among letters to accurately predict and rectify letter errors, thus enhancing the error-correcting performance of Reed-Solomon codes beyond traditional hard-decision decoding limits. Through comparative analyses in the existing dataset and simulated experiments, Derrick-cp's superiority is validated, notably halving the sequencing depth requirement and slashing costs by up to 22% against conventional hard-decision strategies. This advancement signals Derrick-cp's significant role in elevating both the precision and cost-efficiency of composite letter-based DDS.
Collapse
Affiliation(s)
- Yaping Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| | - Lulu Ding
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
- National Engineering Laboratory for Big Data System Computing TechnologyShenzhen UniversityShenzhen518060P. R. China
| | - Shigang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| |
Collapse
|
19
|
Kim JW, Jeong J, Kwak HY, No JS. Design of DNA Storage Coding Scheme With LDPC Codes and Interleaving. IEEE Trans Nanobioscience 2024; 23:447-457. [PMID: 38512749 DOI: 10.1109/tnb.2024.3379976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
In this paper, we propose a new coding scheme for DNA storage using low-density parity-check (LDPC) codes and interleaving techniques. While conventional coding schemes generally employ error correcting codes in both inter and intra-oligo directions, we show that inter-oligo LDPC codes, optimized by differential evolution, are sufficient in ensuring the reliability of DNA storage due to the powerful soft decoding of LDPC codes. In addition, we apply interleaving techniques for handling non-uniform error characteristics of DNA storage to enhance the decoding performance. Consequently, the proposed coding scheme reduces the required number of oligo reads for perfect recovery by 26.25% ~ 38.5% compared to existing state-of-the-art coding schemes. Moreover, we develop an analytical DNA channel model in terms of non-uniform binary symmetric channels. This mathematical model allows us to demonstrate the superiority of the proposed coding scheme while isolating the experimental variation, as well as confirm the independent effects of LDPC codes and interleaving techniques.
Collapse
|
20
|
Mazooji K, Shomorony I. Fast multiple sequence alignment via multi-armed bandits. Bioinformatics 2024; 40:i328-i336. [PMID: 38940160 PMCID: PMC11211838 DOI: 10.1093/bioinformatics/btae225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Multiple sequence alignment is an important problem in computational biology with applications that include phylogeny and the detection of remote homology between protein sequences. UPP is a popular software package that constructs accurate multiple sequence alignments for large datasets based on ensembles of hidden Markov models (HMMs). A computational bottleneck for this method is a sequence-to-HMM assignment step, which relies on the precise computation of probability scores on the HMMs. In this work, we show that we can speed up this assignment step significantly by replacing these HMM probability scores with alternative scores that can be efficiently estimated. Our proposed approach utilizes a multi-armed bandit algorithm to adaptively and efficiently compute estimates of these scores. This allows us to achieve similar alignment accuracy as UPP with a significant reduction in computation time, particularly for datasets with long sequences. AVAILABILITY AND IMPLEMENTATION The code used to produce the results in this paper is available on GitHub at: https://github.com/ilanshom/adaptiveMSA.
Collapse
Affiliation(s)
- Kayvon Mazooji
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Ilan Shomorony
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
21
|
Yu M, Tang X, Li Z, Wang W, Wang S, Li M, Yu Q, Xie S, Zuo X, Chen C. High-throughput DNA synthesis for data storage. Chem Soc Rev 2024; 53:4463-4489. [PMID: 38498347 DOI: 10.1039/d3cs00469d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
With the explosion of digital world, the dramatically increasing data volume is expected to reach 175 ZB (1 ZB = 1012 GB) in 2025. Storing such huge global data would consume tons of resources. Fortunately, it has been found that the deoxyribonucleic acid (DNA) molecule is the most compact and durable information storage medium in the world so far. Its high coding density and long-term preservation properties make itself one of the best data storage carriers for the future. High-throughput DNA synthesis is a key technology for "DNA data storage", which encodes binary data stream (0/1) into quaternary long DNA sequences consisting of four bases (A/G/C/T). In this review, the workflow of DNA data storage and the basic methods of artificial DNA synthesis technology are outlined first. Then, the technical characteristics of different synthesis methods and the state-of-the-art of representative commercial companies, with a primary focus on silicon chip microarray-based synthesis and novel enzymatic DNA synthesis are presented. Finally, the recent status of DNA storage and new opportunities for future development in the field of high-throughput, large-scale DNA synthesis technology are summarized.
Collapse
Affiliation(s)
- Meng Yu
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaohui Tang
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Zhenhua Li
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Weidong Wang
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Shaopeng Wang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Min Li
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Qiuliyang Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
| | - Sijia Xie
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Chang Chen
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, China
| |
Collapse
|
22
|
Ben Shabat D, Hadad A, Boruchovsky A, Yaakobi E. GradHC: highly reliable gradual hash-based clustering for DNA storage systems. Bioinformatics 2024; 40:btae274. [PMID: 38648049 PMCID: PMC11653902 DOI: 10.1093/bioinformatics/btae274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/27/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024] Open
Abstract
MOTIVATION As data storage challenges grow and existing technologies approach their limits, synthetic DNA emerges as a promising storage solution due to its remarkable density and durability advantages. While cost remains a concern, emerging sequencing and synthetic technologies aim to mitigate it, yet introduce challenges such as errors in the storage and retrieval process. One crucial task in a DNA storage system is clustering numerous DNA reads into groups that represent the original input strands. RESULTS In this paper, we review different methods for evaluating clustering algorithms and introduce a novel clustering algorithm for DNA storage systems, named Gradual Hash-based clustering (GradHC). The primary strength of GradHC lies in its capability to cluster with excellent accuracy various types of designs, including varying strand lengths, cluster sizes (including extremely small clusters), and different error ranges. Benchmark analysis demonstrates that GradHC is significantly more stable and robust than other clustering algorithms previously proposed for DNA storage, while also producing highly reliable clustering results. AVAILABILITY AND IMPLEMENTATION https://github.com/bensdvir/GradHC.
Collapse
Affiliation(s)
- Dvir Ben Shabat
- Department of Computer Science, Technion, Haifa 320003,
Israel
| | - Adar Hadad
- Department of Computer Science, Technion, Haifa 320003,
Israel
| | | | - Eitan Yaakobi
- Department of Computer Science, Technion, Haifa 320003,
Israel
| |
Collapse
|
23
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
24
|
Yang S, Wang D, Zhao Z, Wang N, Yu M, Zhang K, Luo Y, Zhao J. A Novel DNA Synthesis Platform Design with High-Throughput Paralleled Addressability and High-Density Static Droplet Confinement. BIOSENSORS 2024; 14:177. [PMID: 38667170 PMCID: PMC11047993 DOI: 10.3390/bios14040177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 04/28/2024]
Abstract
Using DNA as the next-generation medium for data storage offers unparalleled advantages in terms of data density, storage duration, and power consumption as compared to existing data storage technologies. To meet the high-speed data writing requirements in DNA data storage, this paper proposes a novel design for an ultra-high-density and high-throughput DNA synthesis platform. The presented design mainly leverages two functional modules: a dynamic random-access memory (DRAM)-like integrated circuit (IC) responsible for electrode addressing and voltage supply, and the static droplet array (SDA)-based microfluidic structure to eliminate any reaction species diffusion concern in electrochemical DNA synthesis. Through theoretical analysis and simulation studies, we validate the effective addressing of 10 million electrodes and stable, adjustable voltage supply by the integrated circuit. We also demonstrate a reaction unit size down to 3.16 × 3.16 μm2, equivalent to 10 million/cm2, that can rapidly and stably generate static droplets at each site, effectively constraining proton diffusion. Finally, we conducted a synthesis cycle experiment by incorporating fluorescent beacons on a microfabricated electrode array to examine the feasibility of our design.
Collapse
Affiliation(s)
- Shijia Yang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dayin Wang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Zequan Zhao
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ning Wang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Meng Yu
- School of Microelectronics, Shanghai University, Shanghai 200444, China
| | - Kaihuan Zhang
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China;
| | - Yuan Luo
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianlong Zhao
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (S.Y.); (D.W.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
25
|
Zhang X, Zhou F. An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA. IEEE Trans Nanobioscience 2024; 23:344-354. [PMID: 38252580 DOI: 10.1109/tnb.2024.3356522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
DNA storage stands out from other storage media due to its high capacity, eco-friendliness, long lifespan, high stability, low energy consumption, and low data maintenance costs. To standardize the DNA encoding system, maintain consistency in character representation and transmission, and link binary, base, and character together, this paper combines the encoding method with ASCII code to construct an ASCII-DNA encoding table. The encoding method can encode not only pure text information but also audio and video information and satisfies the GC content constraint and the homopolymer constraint, with the encoding density reaching 1.4 bits/nt. In particular, when encoding textual information, it directly skips the binary conversion process, which reduces the complexity of encoding, and increasing the encoding density to 1.6 bits/nt. In order to solve the problem of errors in sequences, under the influence of heuristic algorithms, this paper proposes a new error correction method (HMSA) by combining minimum Hamming distance, multiple sequence alignment, and encoding scheme. It can correct not only substitution, insertion, and deletion errors in Reads but also consecutive errors in Reads. It greatly improves the utilization of the Reads and avoids the waste of resources. Simulation results show that the recovery rate of Reads increases with the increasing number of sequencing times. When the number of erroneous bases in a 150nt sequence reaches 5nt, the error correction rate can exceed 96% by sequencing the base sequence only 10 times regardless of whether the errors are consecutive or not. Additionally, the HMSA error correction method is applicable to all coding schemes for lookup code table types.
Collapse
|
26
|
Kiryanova OY, Garafutdinov RR, Gubaydullin IM, Chemeris AV. A novel approach to encode melodies in DNA. Biosystems 2024; 237:105136. [PMID: 38316169 DOI: 10.1016/j.biosystems.2024.105136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/17/2023] [Accepted: 02/02/2024] [Indexed: 02/07/2024]
Abstract
DNA data storage has gained more attention last decades. DNA molecules can be used for encoding of non-biological information and as promising carriers due to greater data capacity, higher duration of the storage, and better technical failures stability. Here we propose a new method for encoding of notes and music in DNA. The encoding technique takes into account the duration and tonality of each note, enabling to encode all seven octaves by assigning a nucleotide sequence to each key. A certain set of short sequences is suggested to define the duration of note sound. The proposed method allows to encode more complicated melodies compared to the approach based on Huffman algorithm.
Collapse
Affiliation(s)
- Olga Yu Kiryanova
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Ravil R Garafutdinov
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| | - Irek M Gubaydullin
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Alexey V Chemeris
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| |
Collapse
|
27
|
Kim J, Kim H, Bang D. An open-source, 3D printed inkjet DNA synthesizer. Sci Rep 2024; 14:3773. [PMID: 38355610 PMCID: PMC10867077 DOI: 10.1038/s41598-024-53944-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/07/2024] [Indexed: 02/16/2024] Open
Abstract
Synthetic oligonucleotides have become a fundamental tool in a wide range of biological fields, including synthetic biology, biosensing, and DNA storage. Reliable access to equipment for synthesizing high-density oligonucleotides in the laboratory ensures research security and the freedom of research expansion. In this study, we introduced the Open-Source Inkjet DNA Synthesizer (OpenIDS), an open-source inkjet-based microarray synthesizer that offers ease of construction, rapid deployment, and flexible scalability. Utilizing 3D printing, Arduino, and Raspberry Pi, this newly designed synthesizer achieved robust stability with an industrial inkjet printhead. OpenIDS maintains low production costs and is therefore suitable for self-fabrication and optimization in academic laboratories. Moreover, even non-experts can create and control the synthesizer with a high degree of freedom for structural modifications. Users can easily add printheads or alter the design of the microarray substrate according to their research needs. To validate its performance, we synthesized oligonucleotides on 144 spots on a 15 × 25-mm silicon wafer filled with controlled pore glass. The synthesized oligonucleotides were analyzed using urea polyacrylamide gel electrophoresis.
Collapse
Affiliation(s)
- Junhyeong Kim
- Department of Chemistry, Yonsei University, Seoul, Korea
| | - Haeun Kim
- Department of Chemistry, Yonsei University, Seoul, Korea
| | - Duhee Bang
- Department of Chemistry, Yonsei University, Seoul, Korea.
| |
Collapse
|
28
|
Schaudy E, Ibañez-Redín G, Parlar E, Somoza MM, Lietard J. Nonaqueous Oxidation in DNA Microarray Synthesis Improves the Oligonucleotide Quality and Preserves Surface Integrity on Gold and Indium Tin Oxide Substrates. Anal Chem 2024; 96:2378-2386. [PMID: 38285499 PMCID: PMC10867803 DOI: 10.1021/acs.analchem.3c04166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/29/2023] [Accepted: 12/29/2023] [Indexed: 01/30/2024]
Abstract
Nucleic acids attached to electrically conductive surfaces are very frequently used platforms for sensing and analyte detection as well as for imaging. Synthesizing DNA on these uncommon substrates and preserving the conductive layer is challenging as this coating tends to be damaged by the repeated use of iodine and water, which is the standard oxidizing medium following phosphoramidite coupling. Here, we thoroughly investigate the use of camphorsulfonyl oxaziridine (CSO), a nonaqueous alternative to I2/H2O, for the synthesis of DNA microarrays in situ. We find that CSO performs equally well in producing high hybridization signals on glass microscope slides, and CSO also protects the conductive layer on gold and indium tin oxide (ITO)-coated slides. DNA synthesis on conductive substrates with CSO oxidation yields microarrays of quality approaching that of conventional glass with intact physicochemical properties.
Collapse
Affiliation(s)
- Erika Schaudy
- Institute
of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, Vienna 1090, Austria
| | - Gisela Ibañez-Redín
- Institute
of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, Vienna 1090, Austria
| | - Etkin Parlar
- Institute
of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, Vienna 1090, Austria
| | - Mark M. Somoza
- Institute
of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, Vienna 1090, Austria
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Straße 30, Freising 85354, Germany
- Chair
of Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Straße 34, Freising 85354, Germany
| | - Jory Lietard
- Institute
of Inorganic Chemistry, University of Vienna, Josef-Holaubek-Platz 2, Vienna 1090, Austria
| |
Collapse
|
29
|
Sabary O, Yucovich A, Shapira G, Yaakobi E. Reconstruction algorithms for DNA-storage systems. Sci Rep 2024; 14:1951. [PMID: 38263421 PMCID: PMC10806084 DOI: 10.1038/s41598-024-51730-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
Motivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm's estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Alexander Yucovich
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Guy Shapira
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
30
|
Lin W, Chu L, Su Y, Xie R, Yao X, Zan X, Xu P, Liu W. Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method. Comput Biol Med 2023; 166:107548. [PMID: 37801922 DOI: 10.1016/j.compbiomed.2023.107548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/24/2023] [Accepted: 09/28/2023] [Indexed: 10/08/2023]
Abstract
BACKGROUND In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied. METHOD As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences. RESULTS Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.
Collapse
Affiliation(s)
- Wanmin Lin
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ling Chu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Yanqing Su
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ranze Xie
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangyu Yao
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangzhen Zan
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Peng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| | - Wenbin Liu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| |
Collapse
|
31
|
Volkel KD, Lin KN, Hook PW, Timp W, Keung AJ, Tuck JM. FrameD: framework for DNA-based data storage design, verification, and validation. Bioinformatics 2023; 39:btad572. [PMID: 37713474 PMCID: PMC10563143 DOI: 10.1093/bioinformatics/btad572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/27/2023] [Accepted: 09/13/2023] [Indexed: 09/17/2023] Open
Abstract
MOTIVATION DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components. RESULTS We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems. AVAILABILITY AND IMPLEMENTATION The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).
Collapse
Affiliation(s)
- Kevin D Volkel
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| | - Kevin N Lin
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Albert J Keung
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, United States
| | - James M Tuck
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 27606, United States
| |
Collapse
|
32
|
Gimpel AL, Stark WJ, Heckel R, Grass RN. A digital twin for DNA data storage based on comprehensive quantification of errors and biases. Nat Commun 2023; 14:6026. [PMID: 37758710 PMCID: PMC10533828 DOI: 10.1038/s41467-023-41729-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.
Collapse
Affiliation(s)
- Andreas L Gimpel
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Reinhard Heckel
- Department of Computer Engineering, Technical University of Munich, Arcistrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.
| |
Collapse
|
33
|
Yan Y, Pinnamaneni N, Chalapati S, Crosbie C, Appuswamy R. Scaling logical density of DNA storage with enzymatically-ligated composite motifs. Sci Rep 2023; 13:15978. [PMID: 37749195 PMCID: PMC10519978 DOI: 10.1038/s41598-023-43172-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 09/20/2023] [Indexed: 09/27/2023] Open
Abstract
DNA is a promising candidate for long-term data storage due to its high density and endurance. The key challenge in DNA storage today is the cost of synthesis. In this work, we propose composite motifs, a framework that uses a mixture of prefabricated motifs as building blocks to reduce synthesis cost by scaling logical density. To write data, we introduce Bridge Oligonucleotide Assembly, an enzymatic ligation technique for synthesizing oligos based on composite motifs. To sequence data, we introduce Direct Oligonucleotide Sequencing, a nanopore-based technique to sequence short oligos, eliminating common preparatory steps like DNA assembly, amplification and end-prep. To decode data, we introduce Motif-Search, a novel consensus caller that provides accurate reconstruction despite synthesis and sequencing errors. Using the proposed methods, we present an end-to-end experiment where we store the text "HelloWorld" at a logical density of 84 bits/cycle (14-42× improvement over state-of-the-art).
Collapse
Affiliation(s)
- Yiqing Yan
- Data Science Department, EURECOM, Biot, France
| | | | | | | | | |
Collapse
|
34
|
Zhao Y, Cao B, Wang P, Wang K, Wang B. DBTRG: De Bruijn Trim rotation graph encoding for reliable DNA storage. Comput Struct Biotechnol J 2023; 21:4469-4477. [PMID: 37736298 PMCID: PMC10510065 DOI: 10.1016/j.csbj.2023.09.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023] Open
Abstract
DNA is a high-density, long-term stable, and scalable storage medium that can meet the increased demands on storage media resulting from the exponential growth of data. The existing DNA storage encoding schemes tend to achieve high-density storage but do not fully consider the local and global stability of DNA sequences and the read and write accuracy of the stored information. To address these problems, this article presents a graph-based De Bruijn Trim Rotation Graph (DBTRG) encoding scheme. Through XOR between the proposed dynamic binary sequence and the original binary sequence, k-mers can be divided into the De Bruijn Trim graph, and the stored information can be compressed according to the overlapping relationship. The simulated experimental results show that DBTRG ensures base balance and diversity, reduces the likelihood of undesired motifs, and improves the stability of DNA storage and data recovery. Furthermore, the maintenance of an encoding rate of 1.92 while storing 510 KB images and the introduction of novel approaches and concepts for DNA storage encoding methods are achieved.
Collapse
Affiliation(s)
- Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Penghao Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| |
Collapse
|
35
|
Raza MH, Desai S, Aravamudhan S, Zadegan R. An outlook on the current challenges and opportunities in DNA data storage. Biotechnol Adv 2023; 66:108155. [PMID: 37068530 PMCID: PMC11060094 DOI: 10.1016/j.biotechadv.2023.108155] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/23/2023] [Accepted: 04/12/2023] [Indexed: 04/19/2023]
Abstract
Silicon is the gold standard for information storage systems. The exponential generation of digital information will exhaust the global supply of refined silicon. Therefore, investing in alternative information storage materials such as DNA has gained momentum. DNA as a memory material possesses several advantages over silicon-based data storage, including higher storage capacity, data retention, and lower operational energy. Routine DNA data storage approaches encode data into chemically synthesized nucleotide sequences. The scalability of DNA data storage depends on factors such as the cost and the generation of hazardous waste during DNA synthesis, latency of writing and reading, and limited rewriting capacity. Here, we review the current status of DNA data storage encoding, writing, storing, retrieving and reading, and discuss the technology's challenges and opportunities.
Collapse
Affiliation(s)
- Muhammad Hassan Raza
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA
| | - Salil Desai
- Department of Industrial & Systems Engineering, North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Shyam Aravamudhan
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience & Nanoengineering, Greensboro, NC 27401, USA; Center of Excellence in Product Design and Advanced Manufacturing (CEPDAM), North Carolina Agricultural & Technical State University, Greensboro, NC 27411, USA.
| |
Collapse
|
36
|
Zhang XE, Liu C, Dai J, Yuan Y, Gao C, Feng Y, Wu B, Wei P, You C, Wang X, Si T. Enabling technology and core theory of synthetic biology. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1742-1785. [PMID: 36753021 PMCID: PMC9907219 DOI: 10.1007/s11427-022-2214-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/04/2022] [Indexed: 02/09/2023]
Abstract
Synthetic biology provides a new paradigm for life science research ("build to learn") and opens the future journey of biotechnology ("build to use"). Here, we discuss advances of various principles and technologies in the mainstream of the enabling technology of synthetic biology, including synthesis and assembly of a genome, DNA storage, gene editing, molecular evolution and de novo design of function proteins, cell and gene circuit engineering, cell-free synthetic biology, artificial intelligence (AI)-aided synthetic biology, as well as biofoundries. We also introduce the concept of quantitative synthetic biology, which is guiding synthetic biology towards increased accuracy and predictability or the real rational design. We conclude that synthetic biology will establish its disciplinary system with the iterative development of enabling technologies and the maturity of the core theory.
Collapse
Affiliation(s)
- Xian-En Zhang
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Chenli Liu
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Junbiao Dai
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Caixia Gao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Bian Wu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Ping Wei
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Chun You
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Tong Si
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
37
|
Lim CK, Yeoh JW, Kunartama AA, Yew WS, Poh CL. A biological camera that captures and stores images directly into DNA. Nat Commun 2023; 14:3921. [PMID: 37400476 DOI: 10.1038/s41467-023-38876-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 05/19/2023] [Indexed: 07/05/2023] Open
Abstract
The increasing integration between biological and digital interfaces has led to heightened interest in utilizing biological materials to store digital data, with the most promising one involving the storage of data within defined sequences of DNA that are created by de novo DNA synthesis. However, there is a lack of methods that can obviate the need for de novo DNA synthesis, which tends to be costly and inefficient. Here, in this work, we detail a method of capturing 2-dimensional light patterns into DNA, by utilizing optogenetic circuits to record light exposure into DNA, encoding spatial locations with barcoding, and retrieving stored images via high-throughput next-generation sequencing. We demonstrate the encoding of multiple images into DNA, totaling 1152 bits, selective image retrieval, as well as robustness to drying, heat and UV. We also demonstrate successful multiplexing using multiple wavelengths of light, capturing 2 different images simultaneously using red and blue light. This work thus establishes a 'living digital camera', paving the way towards integrating biological systems with digital devices.
Collapse
Affiliation(s)
- Cheng Kai Lim
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), NUS Graduate School, National University of Singapore, Singapore, Singapore
| | - Jing Wui Yeoh
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
| | - Aurelius Andrew Kunartama
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore
| | - Wen Shan Yew
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore
| | - Chueh Loo Poh
- Synthetic Biology for Clinical and Technological Innovation, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore.
- Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
38
|
Yang X, Shi X, Lai L, Chen C, Xu H, Deng M. Towards long double-stranded chains and robust DNA-based data storage using the random code system. Front Genet 2023; 14:1179867. [PMID: 37384333 PMCID: PMC10294226 DOI: 10.3389/fgene.2023.1179867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 05/31/2023] [Indexed: 06/30/2023] Open
Abstract
DNA has become a popular choice for next-generation storage media due to its high storage density and stability. As the storage medium of life's information, DNA has significant storage capacity and low-cost, low-power replication and transcription capabilities. However, utilizing long double-stranded DNA for storage can introduce unstable factors that make it difficult to meet the constraints of biological systems. To address this challenge, we have designed a highly robust coding scheme called the "random code system," inspired by the idea of fountain codes. The random code system includes the establishment of a random matrix, Gaussian preprocessing, and random equilibrium. Compared to Luby transform codes (LT codes), random code (RC) has better robustness and recovery ability of lost information. In biological experiments, we successfully stored 29,390 bits of data in 25,700 bp chains, achieving a storage density of 1.78 bits per nucleotide. These results demonstrate the potential for using long double-stranded DNA and the random code system for robust DNA-based data storage.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Congzhou Chen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Huaisheng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
39
|
Xu C, Ma B, Dong X, Lei L, Hao Q, Zhao C, Liu H. Assembly of Reusable DNA Blocks for Data Storage Using the Principle of Movable Type Printing. ACS APPLIED MATERIALS & INTERFACES 2023; 15:24097-24108. [PMID: 37184884 DOI: 10.1021/acsami.3c01860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Due to its high coding density and longevity, DNA is a compelling data storage alternative. However, current DNA data storage systems rely on the de novo synthesis of enormous DNA molecules, resulting in low data editability, high synthesis costs, and restrictions on further applications. Here, we demonstrate the programmable assembly of reusable DNA blocks for versatile data storage using the ancient movable type printing principle. Digital data are first encoded into nucleotide sequences in DNA hairpins, which are then synthesized and immobilized on solid beads as modular DNA blocks. Using DNA polymerase-catalyzed primer exchange reaction, data can be continuously replicated from hairpins on DNA blocks and attached to a primer in tandem to produce new information. The assembly of DNA blocks is highly programmable, producing various data by reusing a finite number of DNA blocks and reducing synthesis costs (∼1718 versus 3000 to 30,000 US$ per megabyte using conventional methods). We demonstrate the flexible assembly of texts, images, and random numbers using DNA blocks and the integration with DNA logic circuits to manipulate data synthesis. This work suggests a flexible paradigm by recombining already synthesized DNA to build cost-effective and intelligent DNA data storage systems.
Collapse
Affiliation(s)
- Chengtao Xu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Biao Ma
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Xing Dong
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Lanjie Lei
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Qing Hao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Chao Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Hong Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| |
Collapse
|
40
|
Talbot H, Halvorsen K, Chandrasekaran AR. Encoding, Decoding, and Rendering Information in DNA Nanoswitch Libraries. ACS Synth Biol 2023; 12:978-983. [PMID: 36541933 PMCID: PMC10121895 DOI: 10.1021/acssynbio.2c00649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
DNA-based construction allows the creation of molecular devices that are useful in information storage and processing. Here, we combine the programmability of DNA nanoswitches and stimuli-responsive conformational changes to demonstrate information encoding and graphical readout using gel electrophoresis. We encoded information as 5-bit binary codes for alphanumeric characters using a combination of DNA and RNA inputs that can be decoded using molecular stimuli such as a ribonuclease. We also show that a similar strategy can be used for graphical visual readout of alphabets on an agarose gel, information that is encoded by nucleic acids and decoded by a ribonuclease. Our method of information encoding and processing could be combined with DNA actuation for molecular computation and diagnostics that require a nonarbitrary visual readout.
Collapse
Affiliation(s)
- Hannah Talbot
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| | - Ken Halvorsen
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| | - Arun Richard Chandrasekaran
- The RNA Institute, University at Albany, State University of New York, Albany, New York 12203, United States
| |
Collapse
|
41
|
Zan X, Chu L, Xie R, Su Y, Yao X, Xu P, Liu W. An image cryptography method by highly error-prone DNA storage channel. Front Bioeng Biotechnol 2023; 11:1173763. [PMID: 37152655 PMCID: PMC10154519 DOI: 10.3389/fbioe.2023.1173763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 03/30/2023] [Indexed: 05/09/2023] Open
Abstract
Introduction: Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in the DNA storage system is still an unsolved problem. Methods: In this article, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt images in highly error-prone DNA storage channels. Results and Discussion: Numerical results have demonstrated that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise, and data loss). When compared with other methods such as the hybridization reactions of DNA molecules, the proposed method is more reliable and feasible for large-scale applications.
Collapse
Affiliation(s)
- Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ling Chu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ranze Xie
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Yanqing Su
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangyu Yao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China
| |
Collapse
|
42
|
Luo Y, Cao Z, Liu Y, Zhang R, Yang S, Wang N, Shi Q, Li J, Dong S, Fan C, Zhao J. The emerging landscape of microfluidic applications in DNA data storage. LAB ON A CHIP 2023; 23:1981-2004. [PMID: 36946437 DOI: 10.1039/d2lc00972b] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
DNA has been considered a promising alternative to the current solid-state devices for digital information storage. The past decade has witnessed tremendous progress in the field of DNA data storage contributed by researchers from various disciplines. However, the current development status of DNA storage is still far from practical use, mainly due to its high material cost and time consumption for data reading/writing, as well as the lack of a comprehensive, automated, and integrated system. Microfluidics, being capable of handling and processing micro-scale fluid samples in a massively paralleled and highly integrated manner, has gradually been recognized as a promising candidate for addressing the aforementioned issues. In this review, we provide a discussion on recent efforts of applying microfluidics to advance the development of DNA data storage. Moreover, to showcase the tremendous potential that microfluidics can contribute to this field, we will further highlight the recent advancements of applying microfluidics to the key functional modules within the DNA data storage workflow. Finally, we share our perspectives on future directions for how to continue the infusion of microfluidics with DNA data storage and how to advance toward a truly integrated system and reach real-life applications.
Collapse
Affiliation(s)
- Yuan Luo
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhen Cao
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China.
- International Joint Innovation Center, Zhejiang University, Haining 314400, China
| | - Yifan Liu
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Shanghai Clinical Research and Trial Center, Shanghai, 201210, China
| | - Rong Zhang
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| | - Shijia Yang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ning Wang
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingyuan Shi
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| | - Jie Li
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| | - Shurong Dong
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China.
- International Joint Innovation Center, Zhejiang University, Haining 314400, China
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jianlong Zhao
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China.
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, 100101, P.R. China
| |
Collapse
|
43
|
Xie R, Zan X, Chu L, Su Y, Xu P, Liu W. Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage. BMC Bioinformatics 2023; 24:111. [PMID: 36959531 PMCID: PMC10037887 DOI: 10.1186/s12859-023-05237-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/17/2023] [Indexed: 03/25/2023] Open
Abstract
Synchronization (insertions-deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
Collapse
Affiliation(s)
- Ranze Xie
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiangzhen Zan
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Ling Chu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Yanqing Su
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Peng Xu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Wenbin Liu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
44
|
Bencurova E, Akash A, Dobson RC, Dandekar T. DNA storage-from natural biology to synthetic biology. Comput Struct Biotechnol J 2023; 21:1227-1235. [PMID: 36817961 PMCID: PMC9932295 DOI: 10.1016/j.csbj.2023.01.045] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/04/2023] Open
Abstract
Natural DNA storage allows cellular differentiation, evolution, the growth of our children and controls all our ecosystems. Here, we discuss the fundamental aspects of DNA storage and recent advances in this field, with special emphasis on natural processes and solutions that can be exploited. We point out new ways of efficient DNA and nucleotide storage that are inspired by nature. Within a few years DNA-based information storage may become an attractive and natural complementation to current electronic data storage systems. We discuss rapid and directed access (e.g. DNA elements such as promotors, enhancers), regulatory signals and modulation (e.g. lncRNA) as well as integrated high-density storage and processing modules (e.g. chromosomal territories). There is pragmatic DNA storage for use in biotechnology and human genetics. We examine DNA storage as an approach for synthetic biology (e.g. light-controlled nucleotide processing enzymes). The natural polymers of DNA and RNA offer much for direct storage operations (read-in, read-out, access control). The inbuilt parallelism (many molecules at many places working at the same time) is important for fast processing of information. Using biology concepts from chromosomal storage, nucleic acid processing as well as polymer material sciences such as electronical effects in enzymes, graphene, nanocellulose up to DNA macramé , DNA wires and DNA-based aptamer field effect transistors will open up new applications gradually replacing classical information storage methods in ever more areas over time (decades).
Collapse
Affiliation(s)
- Elena Bencurova
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Aman Akash
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Renwick C.J. Dobson
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand,Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Australia
| | - Thomas Dandekar
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany,Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany,Corresponding author at: Department of Bioinformatics, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
45
|
Doricchi A, Platnich CM, Gimpel A, Horn F, Earle M, Lanzavecchia G, Cortajarena AL, Liz-Marzán LM, Liu N, Heckel R, Grass RN, Krahne R, Keyser UF, Garoli D. Emerging Approaches to DNA Data Storage: Challenges and Prospects. ACS NANO 2022; 16:17552-17571. [PMID: 36256971 PMCID: PMC9706676 DOI: 10.1021/acsnano.2c06748] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
With the total amount of worldwide data skyrocketing, the global data storage demand is predicted to grow to 1.75 × 1014 GB by 2025. Traditional storage methods have difficulties keeping pace given that current storage media have a maximum density of 103 GB/mm3. As such, data production will far exceed the capacity of currently available storage methods. The costs of maintaining and transferring data, as well as the limited lifespans and significant data losses associated with current technologies also demand advanced solutions for information storage. Nature offers a powerful alternative through the storage of information that defines living organisms in unique orders of four bases (A, T, C, G) located in molecules called deoxyribonucleic acid (DNA). DNA molecules as information carriers have many advantages over traditional storage media. Their high storage density, potentially low maintenance cost, ease of synthesis, and chemical modification make them an ideal alternative for information storage. To this end, rapid progress has been made over the past decade by exploiting user-defined DNA materials to encode information. In this review, we discuss the most recent advances of DNA-based data storage with a major focus on the challenges that remain in this promising field, including the current intrinsic low speed in data writing and reading and the high cost per byte stored. Alternatively, data storage relying on DNA nanostructures (as opposed to DNA sequence) as well as on other combinations of nanomaterials and biomolecules are proposed with promising technological and economic advantages. In summarizing the advances that have been made and underlining the challenges that remain, we provide a roadmap for the ongoing research in this rapidly growing field, which will enable the development of technological solutions to the global demand for superior storage methodologies.
Collapse
Affiliation(s)
- Andrea Doricchi
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
- Dipartimento
di Chimica e Chimica Industriale, Università
di Genova, via Dodecaneso
31, 16146 Genova, Italy
| | - Casey M. Platnich
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Andreas Gimpel
- Institute
for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland
| | - Friederikee Horn
- Technical
University of Munich, Department of Electrical
and Computer Engineering Munchen, Bayern, DE 80333, Germany
| | - Max Earle
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - German Lanzavecchia
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
- Dipartimento
di Fisica, Università di Genova, via Dodecaneso 33, 16146 Genova, Italy
| | - Aitziber L. Cortajarena
- Center
for Cooperative Research in Biomaterials (CICbiomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 194, 20014 Donostia-San Sebastián, Spain
- Ikerbasque, Basque
Foundation for Science, 48009 Bilbao, Spain
| | - Luis M. Liz-Marzán
- Center
for Cooperative Research in Biomaterials (CICbiomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 194, 20014 Donostia-San Sebastián, Spain
- Ikerbasque, Basque
Foundation for Science, 48009 Bilbao, Spain
- Biomedical
Research Networking Center in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Av. Monforte de Lemos, 3-5. Pabellón 11.
Planta 0, 28029 Madrid, Spain
| | - Na Liu
- Second
Physics Institute, University of Stuttgart, 70569 Stuttgart, Germany
- Max Planck Institute for Solid State Research, 70569 Stuttgart, Germany
| | - Reinhard Heckel
- Technical
University of Munich, Department of Electrical
and Computer Engineering Munchen, Bayern, DE 80333, Germany
| | - Robert N. Grass
- Institute
for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland
| | - Roman Krahne
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
| | - Ulrich F. Keyser
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Denis Garoli
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
| |
Collapse
|
46
|
Meiser LC, Gimpel AL, Deshpande T, Libort G, Chen WD, Heckel R, Nguyen BH, Strauss K, Stark WJ, Grass RN. Information decay and enzymatic information recovery for DNA data storage. Commun Biol 2022; 5:1117. [PMID: 36266439 PMCID: PMC9584896 DOI: 10.1038/s42003-022-04062-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 09/30/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic DNA has been proposed as a storage medium for digital information due to its high theoretical storage density and anticipated long storage horizons. However, under all ambient storage conditions, DNA undergoes a slow chemical decay process resulting in nicked (broken) DNA strands, and the information stored in these strands is no longer readable. In this work we design an enzymatic repair procedure, which is applicable to the DNA pool prior to readout and can partially reverse the damage. Through a chemical understanding of the decay process, an overhang at the 3' end of the damaged site is identified as obstructive to repair via the base excision-repair (BER) mechanism. The obstruction can be removed via the enzyme apurinic/apyrimidinic endonuclease I (APE1), thereby enabling repair of hydrolytically damaged DNA via Bst polymerase and Taq ligase. Simulations of damage and repair reveal the benefit of the enzymatic repair step for DNA data storage, especially when data is stored in DNA at high storage densities (=low physical redundancy) and for long time durations.
Collapse
Affiliation(s)
- Linda C Meiser
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Andreas L Gimpel
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Tejas Deshpande
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Gabriela Libort
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Weida D Chen
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Reinhard Heckel
- Department of Electrical and Computer Engineering, Technical University of Munich, Arcistrasse 21, 80333, Munich, Germany
| | | | | | - Wendelin J Stark
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Robert N Grass
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland.
| |
Collapse
|
47
|
Song L, Geng F, Gong ZY, Chen X, Tang J, Gong C, Zhou L, Xia R, Han MZ, Xu JY, Li BZ, Yuan YJ. Robust data storage in DNA by de Bruijn graph-based de novo strand assembly. Nat Commun 2022; 13:5361. [PMID: 36097016 PMCID: PMC9468002 DOI: 10.1038/s41467-022-33046-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
Collapse
Affiliation(s)
- Lifu Song
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Feng Geng
- College of Pharmacy, Binzhou Medical University, Yantai, 264003, Shandong Province, China
| | - Zi-Yi Gong
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Xin Chen
- Centor for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Chunye Gong
- National SuperComputer Center in Tianjin, Tianjin, 300457, China
| | - Libang Zhou
- College of Food Science and Technology, Nanjing Agricultural University, Nanjing, 210095, Jiangsu Province, China
| | - Rui Xia
- National SuperComputer Center in Tianjin, Tianjin, 300457, China
| | - Ming-Zhe Han
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Jing-Yi Xu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Bing-Zhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China.
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Ying-Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China.
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
48
|
Qu G, Yan Z, Wu H. Clover: tree structure-based efficient DNA clustering for DNA-based data storage. Brief Bioinform 2022; 23:6668252. [PMID: 35975958 DOI: 10.1093/bib/bbac336] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/21/2022] [Accepted: 07/22/2022] [Indexed: 11/12/2022] Open
Abstract
Deoxyribonucleic acid (DNA)-based data storage is a promising new storage technology which has the advantage of high storage capacity and long storage time compared with traditional storage media. However, the synthesis and sequencing process of DNA can randomly generate many types of errors, which makes it more difficult to cluster DNA sequences to recover DNA information. Currently, the available DNA clustering algorithms are targeted at DNA sequences in the biological domain, which not only cannot adapt to the characteristics of sequences in DNA storage, but also tend to be unacceptably time-consuming for billions of DNA sequences in DNA storage. In this paper, we propose an efficient DNA clustering method termed Clover for DNA storage with linear computational complexity and low memory. Clover avoids the computation of the Levenshtein distance by using a tree structure for interval-specific retrieval. We argue through theoretical proofs that Clover has standard linear computational complexity, low space complexity, etc. Experiments show that our method can cluster 10 million DNA sequences into 50 000 classes in 10 s and meet an accuracy rate of over 99%. Furthermore, we have successfully completed an unprecedented clustering of 10 billion DNA data on a single home computer and the time consumption still satisfies the linear relationship. Clover is freely available at https://github.com/Guanjinqu/Clover.
Collapse
Affiliation(s)
- Guanjin Qu
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Zihui Yan
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Huaming Wu
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
49
|
Winston C, Organick L, Ward D, Ceze L, Strauss K, Chen YJ. Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools. ACS Synth Biol 2022; 11:1727-1734. [PMID: 35191684 DOI: 10.1021/acssynbio.1c00482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
With the rapidly decreasing cost of array-based oligo synthesis, large-scale oligo pools offer significant benefits for advanced applications including gene synthesis, CRISPR-based gene editing, and DNA data storage. The selective retrieval of specific oligos from these complex pools traditionally uses polymerase chain reaction (PCR). Designing a large number of primers to use in PCR presents a serious challenge, particularly for DNA data storage, where the size of an oligo pool is orders of magnitude larger than other applications. Although a nested primer address system was recently developed to increase the number of accessible files for DNA storage, it requires more complicated lab protocols and more expensive reagents to achieve high specificity, as well as more DNA address space. Here, we present a new combinatorial PCR method that has none of those drawbacks and outperforms in retrieval specificity. In experiments, we accessed three files that each comprised 1% of a DNA prototype database that contained 81 different files and enriched them to over 99.9% using our combinatorial primer method. Our method provides a viable path for scaling up DNA data storage systems and has broader utility whenever one must access a specific target oligo and can design their own primer regions.
Collapse
Affiliation(s)
- Claris Winston
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Lee Organick
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - David Ward
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Luis Ceze
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Karin Strauss
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
- Microsoft Research, Redmond, Washington 98052, United States
| | - Yuan-Jyue Chen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
- Microsoft Research, Redmond, Washington 98052, United States
| |
Collapse
|
50
|
Ezekannagha C, Becker A, Heider D, Hattab G. Design considerations for advancing data storage with synthetic DNA for long-term archiving. Mater Today Bio 2022; 15:100306. [PMID: 35677811 PMCID: PMC9167972 DOI: 10.1016/j.mtbio.2022.100306] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 05/05/2022] [Accepted: 05/22/2022] [Indexed: 11/22/2022]
Abstract
Deoxyribonucleic acid (DNA) is increasingly emerging as a serious medium for long-term archival data storage because of its remarkable high-capacity, high-storage-density characteristics and its lasting ability to store data for thousands of years. Various encoding algorithms are generally required to store digital information in DNA and to maintain data integrity. Indeed, since DNA is the information carrier, its performance under different processing and storage conditions significantly impacts the capabilities of the data storage system. Therefore, the design of a DNA storage system must meet specific design considerations to be less error-prone, robust and reliable. In this work, we summarize the general processes and technologies employed when using synthetic DNA as a storage medium. We also share the design considerations for sustainable engineering to include viability. We expect this work to provide insight into how sustainable design can be used to develop an efficient and robust synthetic DNA-based storage system for long-term archiving.
Collapse
Affiliation(s)
- Chisom Ezekannagha
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
- Corresponding author.
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, D-35043, Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| |
Collapse
|