1
|
Shen P, Zheng Y, Zhang C, Li S, Chen Y, Chen Y, Liu Y, Cai Z. DNA storage: The future direction for medical cold data storage. Synth Syst Biotechnol 2025; 10:677-695. [PMID: 40235856 PMCID: PMC11999466 DOI: 10.1016/j.synbio.2025.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 04/17/2025] Open
Abstract
DNA storage, characterized by its durability, data density, and cost-effectiveness, is a promising solution for managing the increasing data volumes in healthcare. This review explores state-of-the-art DNA storage technologies, and provides insights into designing a DNA storage system tailored for medical cold data. We anticipate that a practical approach for medical cold data storage will involve establishing regional, in vitro DNA storage centers that can serve multiple hospitals. The immediacy of DNA storage for medical data hinges on the development of novel, high-density, specialized coding methods. Established commercial techniques, such as DNA chemical synthesis and next-generation sequencing (NGS), along with mixed drying with alkaline salts and refined Polymerase Chain Reaction (PCR), potentially represent the optimal options for data writing, reading, storage, and accessing, respectively. Data security could be promised by the integration of traditional digital encryption and DNA steganography. Although breakthrough developments like artificial nucleotides and DNA nanostructures show potential, they remain in the laboratory research phase. In conclusion, DNA storage is a viable preservation strategy for medical cold data in the near future.
Collapse
Affiliation(s)
- Peilin Shen
- Department of Urology, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yukui Zheng
- The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - CongYu Zhang
- Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Shuo Li
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, PR China
- BGI-Shenzhen, Shenzhen, Guangdong Province, PR China
- BGI Hospital Groups, Ltd., Shenzhen, Guangdong Province, PR China
| | - Yongru Chen
- Department of Emergency Intensive Care Unit, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yongsong Chen
- Department of Endocrinology, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong Province, PR China
| | - Yuchen Liu
- Shenzhen Institute of Translational Medicine, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Health Science Center, Shenzhen University, Shenzhen, Guangdong Province, PR China
- Key Laboratory of Medical Reprogramming Technology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Guangdong Province, PR China
| | - Zhiming Cai
- Shantou University Medical College, Shantou, Guangdong Province, PR China
- Key Laboratory of Medical Reprogramming Technology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Guangdong Key Laboratory of Systems Biology and Synthetic Biology for Urogenital Tumors, Shenzhen, Guangdong Province, PR China
- State Engineering Laboratory of Medical Key Technologies Application of Synthetic Biology, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong Province, PR China
- Carson International Cancer Center of Shenzhen University, Shenzhen, Guangdong Province, PR China
| |
Collapse
|
2
|
Qu G, Yan Z, Chen X, Wu H. DNA data storage for biomedical images using HELIX. NATURE COMPUTATIONAL SCIENCE 2025:10.1038/s43588-025-00793-x. [PMID: 40360759 DOI: 10.1038/s43588-025-00793-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 03/18/2025] [Indexed: 05/15/2025]
Abstract
Deoxyribonucleic acid (DNA) data storage is expected to become a key medium for large-scale data. Biomedical data images typically require substantial storage space over extended periods, making them ideal candidates for DNA data storage. However, existing DNA data storage models are primarily designed for generic files and lack a comprehensive retrieval system for biomedical images. Here, to address this, we propose HELIX, a DNA-based storage system for biomedical images. HELIX introduces an image-compression algorithm tailored to the characteristics of biomedical images, achieving high compression rates and robust error tolerance. In addition, HELIX incorporates an error-correcting encoding algorithm that eliminates the need for indexing, enhancing storage density and decoding speed. We utilize a deep learning-based image repair algorithm for the predictive restoration of partially missing image blocks. In our in vitro experiments, we successfully stored two spatiotemporal genomics images. This sequencing process achieved 97.20% image quality at a depth of 7× coverage.
Collapse
Affiliation(s)
- Guanjin Qu
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China
| | - Zihui Yan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, P. R. China
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China
| | - Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China
| | - Huaming Wu
- Center for Applied Mathematics, Tianjin University, Tianjin, P. R. China.
- State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin, P. R. China.
| |
Collapse
|
3
|
Fan Q, Zhao X, Li J, Liu R, Liu M, Feng Q, Long Y, Fu Y, Zhai J, Pan Q, Li Y. De novo non-canonical nanopore basecalling enables private communication using heavily-modified DNA data at single-molecule level. Nat Commun 2025; 16:4099. [PMID: 40316536 PMCID: PMC12048662 DOI: 10.1038/s41467-025-59357-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 04/16/2025] [Indexed: 05/04/2025] Open
Abstract
Hidden messages in DNA molecules by employing chemical modifications has been suggested for private data storage and transmission at high information density. However, rapidly decoding these "molecular keys" with corresponding basecallers remains challenging. We present DeepSME, a nanopore sequencing and deep-learning based framework towards single-molecule encryption, demonstrated by using 5-hydroxymethylcytosine (5hmC) substitution for individual nucleotide recognition rather than sequential interactions. This non-natural, motif-insensitive methylation disrupts ion current, resulting in a readout failure of 67.2%-100%, concealing the privacy within the DNAs. We further develop an alignment-free DeepSME basecaller as a key to reconstitute the digital information. Our three-stage training pipeline, expands k-mer size from 46 to 49, achieving over 92% precision and recall from scratch. DeepSME deciphers fully 5hmC concealed text and image within 16× coverage depth with an F1-score of 86.4%, surpassing all the state-of-the-art basecallers. Demonstrated on edge computing devices, DeepSME holds supreme potential for DNA-based private communications and broader bioengineering and medical applications.
Collapse
Affiliation(s)
- Qingyuan Fan
- School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China
| | - Xuyang Zhao
- School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China
| | - Junyao Li
- School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China
| | - Ronghui Liu
- School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China
| | - Ming Liu
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Qishun Feng
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People's Hospital, The Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, China
| | - Yanping Long
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Yang Fu
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Jixian Zhai
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Qing Pan
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yi Li
- School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
4
|
Xie L, Cao B, Wen X, Zheng Y, Wang B, Zhou S, Zheng P. ReLume: Enhancing DNA storage data reconstruction with flow network and graph partitioning. Methods 2025; 240:101-112. [PMID: 40268154 DOI: 10.1016/j.ymeth.2025.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 03/06/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
DNA storage is an ideal alternative to silicon-based storage, but focusing on data writing alone cannot address the inevitable errors and durability issues. Therefore, we propose ReLume, a DNA storage data reconstruction method based on flow networks and graph partitioning technology, which can accomplish the data reconstruction task of millions of reads on a laptop with 24 GB RAM. The results show that ReLume copes well with many types of errors, more than doubles sequence recovery rates, and reduces memory usage by about 60 %. ReLume is 10 times more durable than other representative methods, meaning that data can be read without loss after 100 years. Results from the wet lab DNA storage dataset show that ReLume's sequence recovery rates of 73 % and 93.2 %, respectively, significantly outperform existing methods. In summary, ReLume effectively overcomes the accuracy and hardware limitations and provides a feasible idea for the portability of DNA storage.
Collapse
Affiliation(s)
- Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Xiaoru Wen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, 8140 Christchurch, New Zealand
| |
Collapse
|
5
|
Liao R, Luo D, Yang D, Liu J. Opportunities and Challenges of DNA Materials toward Sustainable Development Goals. ACS NANO 2025; 19:11465-11476. [PMID: 40099911 DOI: 10.1021/acsnano.4c17718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
Sustainable development represents a significant and pressing challenge confronting the global community at present. A wide variety of macroscopic engineering systems has been developed to promote sustainable development. Recent advancements in DNA materials have showcased their substantial contributions toward achieving sustainable development goals (SDGs). Compared to nonbiological materials, DNA materials possess exceptional properties such as genetic functionality, molecular programmability, recognition capabilities, and biocompatibility. These unique characteristics enable DNA materials to serve as general and versatile substrates beyond their genetic role. Consequently, they can be used to develop DNA-based engineering systems that offer versatile solutions to support sustainable development. In this Perspective, we critically examine the opportunities that DNA-based engineering systems present in contributing to the achievement of the SDGs within various real-world scenarios. We establish direct relationships between DNA-based engineering systems and the SDGs, highlighting their inherent merits in accelerating sustainable development. Furthermore, in order to successfully achieve SDGs, we address the challenges associated with these systems and emphasize the urgent need for developing multifunctional, reliable, biosafe, and intelligent DNA-based engineering systems to overcome these challenges.
Collapse
Affiliation(s)
- Renkuan Liao
- College of Land Science and Technology, Key Laboratory of Arable Land Conservation in North China, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, People's Republic of China
- State Key Laboratory of Efficient Utilization of Agricultural Water Resources, China Agricultural University, Beijing 100083, People's Republic of China
| | - Dan Luo
- Department of Biological & Environmental Engineering, Cornell University, Ithaca, New York 14853, United States
| | - Dayong Yang
- Department of Chemistry, State Key Laboratory of Molecular Engineering of Polymers, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, College of Chemistry and Materials, Fudan University, Shanghai 200438, People's Republic of China
| | - Jianguo Liu
- Center for Systems Integration and Sustainability, Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan 48823, United States
| |
Collapse
|
6
|
Pretorius IS, Dixon TA, Boers M, Paulsen IT, Johnson DL. The coming wave of confluent biosynthetic, bioinformational and bioengineering technologies. Nat Commun 2025; 16:2959. [PMID: 40140397 PMCID: PMC11947079 DOI: 10.1038/s41467-025-58030-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 03/11/2025] [Indexed: 03/28/2025] Open
Abstract
Information and energy flows form the basis of all economic activity, with advanced technologies underpinning both. Profound uncertainties caused by geostrategic forces have accelerated a trillion-dollar race for technological superiority. The result is an onrush of "technovation" at the nexus of synthetic biotechnologies, information technologies, nanotechnologies and engineering technologies. This article explores recent breakthroughs in integrating chip technologies and synthetic bioinformational engineering. It investigates prospects of biomolecules as carriers of stored digital data, synthetic cells-on-a-chip, and hybrid semiconductors and next-generation artificial intelligence processors. Consilience-unity of knowledge-redefines possibilities emerging from the living interface of biologically-inspired engineering and engineering-enabled biology.
Collapse
Affiliation(s)
- Isak S Pretorius
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia.
| | - Thomas A Dixon
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
| | - Michael Boers
- Silicon Platforms Laboratory, Macquarie University, Sydney, NSW, Australia
| | - Ian T Paulsen
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
| | - Daniel L Johnson
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia
| |
Collapse
|
7
|
Ge Q, Qin R, Liu S, Guo Q, Han C, Chen W. Pragmatic soft-decision data readout of encoded large DNA. Brief Bioinform 2025; 26:bbaf102. [PMID: 40091194 PMCID: PMC11911122 DOI: 10.1093/bib/bbaf102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 02/06/2025] [Accepted: 02/24/2025] [Indexed: 03/19/2025] Open
Abstract
The encoded large DNA can be cloned and stored in vivo, capable of write-once and stable replication for multiple retrievals, offering potential in economic data archiving. Nanopore sequencing is advantageous in data access of large DNA due to its rapidity and long-read sequencing capability. However, the data readout is commonly limited by insertion and deletion (indel) errors and sequence assembly complexity. Here, a pragmatic soft-decision data readout is presented, achieving assembly-free sequence reconstruction, indel error correction, and ultra-low coverage data readout. Specifically, the watermark is cleverly embedded within large DNA fragments, allowing for the direct localization of raw reads via watermark alignment to avoid complex read assembly. A soft-decision forward-backward algorithm is proposed, which can identify indel errors and provide probability information to the error correction code, enabling error-free data recovery. Additionally, a minimum state transition is maintained, and a read segmentation is incorporated to achieve fast information reading. The readout assays for two circular plasmids (~51 kb) with different coding rates were demonstrated and achieved error-free recovery directly from noisy reads (error rate ~1%) at coverage of 1-4×. Simulations conducted on large-scale datasets across various error rates further confirm the scalability of the method and its robust performance under extreme conditions. This readout method enables nearly single-molecule recovery of large DNA, particularly suitable for rapid readout of DNA storage.
Collapse
Affiliation(s)
- Qi Ge
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Rui Qin
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Shuang Liu
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Quan Guo
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Changcai Han
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
- Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Weigang Chen
- School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
- Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, School of Microelectronics, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
- Frontier Science Center for Synthetic Biology, Ministry of Education, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| |
Collapse
|
8
|
Su Y, Chu L, Lin W, Yao X, Xu P, Liu W. A Robust and Efficient Representation-based DNA Storage Architecture by Deep Learning. SMALL METHODS 2025; 9:e2400959. [PMID: 40114483 DOI: 10.1002/smtd.202400959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 12/09/2024] [Indexed: 03/22/2025]
Abstract
As one main form of multimedia data, images play a critical role in various applications. In this paper, a representation-based architecture is proposed which takes advantage of the outstanding representation and image-generation abilities of deep learning (DL). This architecture includes two DL models: an autoencoder and a U-Net network which achieve the representation, construction, and refinement of images from the noisy reads in DNA storage. Simulation experiments demonstrate that it can reconstruct images of moderate quality in scenarios where insertion-deletion-substitution (IDS) errors are less than 6%. Combined with the feature quantization, it also offers a flexible way to achieve a balanced trade-off between compression ratio and image quality by selecting an approximate representation channel number. Additionally, the quality of images can be boosted by using multiple reads which are a common situation in DNA storage. A wet lab practice that successfully reconstructs an image stored in 14 plasmids further proves the feasibility of the proposed architecture. Instead of storing the original image information, the representation-based architecture provides a competitive solution which achieves robust and efficient DNA storage for large-scale image applications.
Collapse
Affiliation(s)
- Yanqing Su
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Ling Chu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Wanmin Lin
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiangyu Yao
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Peng Xu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, 510006, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Wenbin Liu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, 510006, China
| |
Collapse
|
9
|
Wang S, Yang D, Li J, Bao H, Pan S, Huang K, Shao J, Yang Q, Chen X, Jiang X, Wang P, Yang Y. DNA Origami Framework-Based Spatial Nanochip for Circular ssDNA Assembly and Data Storage. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2025; 21:e2410391. [PMID: 39846277 DOI: 10.1002/smll.202410391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 12/31/2024] [Indexed: 01/24/2025]
Abstract
A 3D DNA spatial chip (DSC) based on an icosahedral DNA origami framework is introduced to construct customized circular single-stranded DNA (c-ssDNA) for data storage. Within the confined space of the DSC, thirty addressable location sequences extending from the framework edges are available for designing circular paths and directing the assembly of a series of information oligonucleotides for efficient ligation. This strategy is verified by constructing c-ssDNAs from up to 15 fragments to encode two poems (800 and 860 nucleotides). Using orthogonal location sites, both poems are simultaneously assembled within a single DSC and read out together. Rolling circle amplification (RCA) and nanopore sequencing enable complete retrieval of all the above data files. The DSCs with distinct fluorescent labels and capture sequences are further functionalized on their outer surfaces, allowing magnetic bead-based retrieval and rapid identification of specific datasets from a mixture. Moreover, the DSCs maintain data integrity after storage under various conditions. These findings demonstrate that the 3D DNA spatial chip provides an efficient approach for assembling long c-ssDNA for data storage, addressing limitations by reducing redundancy, enhancing stability, and enabling multiplexed storage and retrieval.
Collapse
Affiliation(s)
- Shengwen Wang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Donglei Yang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Jiankai Li
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, No. 1088 Xueyuan Rd., Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Hongliang Bao
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Shufan Pan
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Kui Huang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Jialin Shao
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, No. 1088 Xueyuan Rd., Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Qiulan Yang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Xiao Chen
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Xingyu Jiang
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, No. 1088 Xueyuan Rd., Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Pengfei Wang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Yang Yang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Department of Laboratory Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| |
Collapse
|
10
|
Yan Z, Zhang H, Lu B, Han T, Tong X, Yuan Y. DNA palette code for time-series archival data storage. Natl Sci Rev 2025; 12:nwae321. [PMID: 39758123 PMCID: PMC11697981 DOI: 10.1093/nsr/nwae321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/21/2024] [Accepted: 08/28/2024] [Indexed: 01/07/2025] Open
Abstract
The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called the DNA palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed, but require reliable long-term storage for retrospective research. The DNA palette code employs unordered combinations of index-free oligonucleotides to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vitro testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets and meteorological datasets, demonstrates the advantages of our coding scheme, including high net information density, low decoding sequence coverage and wide applicability.
Collapse
Affiliation(s)
- Zihui Yan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Haoran Zhang
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Boyuan Lu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Tong Han
- Department of Neurosurgery, Huanhu Hospital, Tianjin 300350, China
| | - Xiaoguang Tong
- Department of Neurosurgery, Huanhu Hospital, Tianjin 300350, China
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| |
Collapse
|
11
|
Ma Y, Chen S, Xu Q, Lu Z, Bi K. High-Risk Sequence Prediction Model in DNA Storage: The LQSF Method. IEEE Trans Nanobioscience 2025; 24:89-101. [PMID: 38976468 DOI: 10.1109/tnb.2024.3424576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Traditional DNA storage technologies rely on passive filtering methods for error correction during synthesis and sequencing, which result in redundancy and inadequate error correction. Addressing this, the Low Quality Sequence Filter (LQSF) was introduced, an innovative method employing deep learning models to predict high-risk sequences. The LQSF approach leverages a classification model trained on error-prone sequences, enabling efficient pre-sequencing filtration of low-quality sequences and reducing time and resources in subsequent stages. Analysis has demonstrated a clear distinction between high and low-quality sequences, confirming the efficacy of the LQSF method. Extensive training and testing were conducted across various neural networks and test sets. The results showed all models achieving an AUC value above 0.91 on ROC curves and over 0.95 on PR curves across different datasets. Notably, models such as Alexnet, VGG16, and VGG19 achieved a perfect AUC of 1.0 on the Original dataset, highlighting their precision in classification. Further validation using Illumina sequencing data substantiated a strong correlation between model scores and sequence error-proneness, emphasizing the model's applicability. The LQSF method marks a significant advancement in DNA storage technology, introducing active sequence filtering at the encoding stage. This pioneering approach holds substantial promise for future DNA storage research and applications.
Collapse
|
12
|
Şatır E. A DNA Data Storage Method Using Spatial Encoding Based Lossless Compression. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1116. [PMID: 39766746 PMCID: PMC11675758 DOI: 10.3390/e26121116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 12/11/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025]
Abstract
With the rapid increase in global data and rapid development of information technology, DNA sequences have been collected and manipulated on computers. This has yielded a new and attractive field of bioinformatics, DNA storage, where DNA has been considered as a great potential storage medium. It is known that one gram of DNA can store 215 GB of data, and the data stored in the DNA can be preserved for tens of thousands of years. In this study, a lossless and reversible DNA data storage method was proposed. The proposed approach employs a vector representation of each DNA base in a two-dimensional (2D) spatial domain for both encoding and decoding. The structure of the proposed method is reversible, rendering the decompression procedure possible. Experiments were performed to investigate the capacity, compression ratio, stability, and reliability. The obtained results show that the proposed method is much more efficient in terms of capacity than other known algorithms in the literature.
Collapse
Affiliation(s)
- Esra Şatır
- Computer Engineering Department, Düzce University, 81620 Düzce, Turkey
| |
Collapse
|
13
|
Bi K, Xu Q, Lai X, Zhao X, Lu Z. Multi-file dynamic compression method based on classification algorithm in DNA storage. Med Biol Eng Comput 2024; 62:3623-3635. [PMID: 38922373 DOI: 10.1007/s11517-024-03156-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 06/17/2024] [Indexed: 06/27/2024]
Abstract
The exponential growth in data volume has necessitated the adoption of alternative storage solutions, and DNA storage stands out as the most promising solution. However, the exorbitant costs associated with synthesis and sequencing impeded its development. Pre-compressing the data is recognized as one of the most effective approaches for reducing storage costs. However, different compression methods yield varying compression ratios for the same file, and compressing a large number of files with a single method may not achieve the maximum compression ratio. This study proposes a multi-file dynamic compression method based on machine learning classification algorithms that selects the appropriate compression method for each file to minimize the amount of data stored into DNA as much as possible. Firstly, four different compression methods are applied to the collected files. Subsequently, the optimal compression method is selected as a label, as well as the file type and size are used as features, which are put into seven machine learning classification algorithms for training. The results demonstrate that k-nearest neighbor outperforms other machine learning algorithms on the validation set and test set most of the time, achieving an accuracy rate of over 85% and showing less volatility. Additionally, the compression rate of 30.85% can be achieved according to k-nearest neighbor model, more than 4.5% compared to the traditional single compression method, resulting in significant cost savings for DNA storage in the range of $0.48 to 3 billion/TB. In comparison to the traditional compression method, the multi-file dynamic compression method demonstrates a more significant compression effect when compressing multiple files. Therefore, it can considerably decrease the cost of DNA storage and facilitate the widespread implementation of DNA storage technology.
Collapse
Affiliation(s)
- Kun Bi
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China.
| | - Qi Xu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| | - Xin Lai
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Xiangwei Zhao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Zuhong Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| |
Collapse
|
14
|
Qin Y, Zhu F, Xi B, Song L. Robust multi-read reconstruction from noisy clusters using deep neural network for DNA storage. Comput Struct Biotechnol J 2024; 23:1076-1087. [PMID: 39807110 PMCID: PMC11725466 DOI: 10.1016/j.csbj.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/17/2024] [Accepted: 02/26/2024] [Indexed: 01/16/2025] Open
Abstract
DNA holds immense potential as an emerging data storage medium. However, the recovery of information in DNA storage systems faces challenges posed by various errors, including IDS errors, strand breaks, and rearrangements, inevitably introduced during synthesis, amplification, sequencing, and storage processes. Sequence reconstruction, crucial for decoding, involves inferring the DNA reference from a cluster of erroneous copies. While most methods assume equal contributions from all reads within a cluster as noisy copies of the same reference, they often overlook the existence of contaminated sequences caused by DNA breaks, rearrangements, or mis-clustering reads. To address this issue, we propose RobuSeqNet, a robust multi-read reconstruction neural network specifically designed to robustly reconstruct multiple reads, accommodating noisy clusters with strand breakage, rearrangements, and mis-clustered strands. Leveraging the attention mechanism and an elaborate network design, RobuSeqNet exhibits resilience to highly-noisy clusters and effectively deals with in-strand IDS errors. The effectiveness and robustness of the proposed method are validated on three representative next-generation sequencing datasets. Results demonstrate that RobuSeqNet maintains high sequence reconstruction success rates of 99.74%, 99.58%, and 96.44% across three datasets, even in the presence of noisy clusters containing up to 20% contaminated sequences, outperforming known sequence reconstruction models. Additionally, in scenarios without contaminated sequences, it exhibits comparable performance to existing models, achieving success rates of 99.88%, 99.82%, and 97.68% across the three datasets.
Collapse
Affiliation(s)
- Yun Qin
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Fei Zhu
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Bo Xi
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Lifu Song
- Systems Biology Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Haihe Laboratory of Synthetic Biology, Tianjin, China
| |
Collapse
|
15
|
Bar-Lev D, Sabary O, Yaakobi E. The zettabyte era is in our DNA. NATURE COMPUTATIONAL SCIENCE 2024; 4:813-817. [PMID: 39516373 DOI: 10.1038/s43588-024-00717-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/03/2024] [Indexed: 11/16/2024]
Abstract
This Perspective surveys the critical computational challenges associated with in vitro DNA-based data storage. As digital data expand exponentially, traditional storage media are becoming less viable, making DNA a promising solution due to its density and durability. However, numerous obstacles remain, including error correction, data retrieval from large volumes of noisy reads, and scalability. The Perspective also highlights challenges for DNA-based data centers, such as fault tolerance, random access, and data removal, which must be addressed to make DNA-based storage practical.
Collapse
Affiliation(s)
- Daniella Bar-Lev
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| | - Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
16
|
Guan X, Zhu C, Dong Y, Liu D, Mao C. Multiple-unit interlocking enhances the single-stranded tiles assembly of DNA nanostructures. NANOSCALE 2024; 16:19642-19648. [PMID: 39382240 DOI: 10.1039/d4nr03288h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Single-stranded tiles (DNA brick) assembly has provided a simple and modular tool for constructing nanostructures with the potential for numerous applications. However, in this strategy, the short-strand building blocks are susceptible to environmental fluctuations and bring about rapid dissociation during assembly, resulting in instability and prolonged annealing. Thus, developing new strategies which can enhance the stability and accelerate the assembly process of DNA bricks is important. In this study, we applied the kinetically interlocking multiple-unit (KIMU) strategy to tune the process of DNA brick assembly by adopting long DNA strands as building blocks, ranging from tens of to 1000 nucleotides. We constructed a series of DNA structures with improved stability over DNA bricks. Furthermore, the annealing process could be accelerated by increasing the number of units. Our study demonstrated that DNA assembly based on the KIMU strategy using multiple-unit DNA strands could be a promising method for constructing relatively stable DNA nanostructures.
Collapse
Affiliation(s)
- Xiangxiang Guan
- Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China.
- Engineering Research Center of Advanced Rare Earth Materials, (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China
| | - Chenyou Zhu
- Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China.
- Engineering Research Center of Advanced Rare Earth Materials, (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China
| | - Yuanchen Dong
- CAS Key Laboratory of Colloid Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, P. R. China.
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Dongsheng Liu
- Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China.
- Engineering Research Center of Advanced Rare Earth Materials, (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, P. R. China
| | - Chengde Mao
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, USA.
| |
Collapse
|
17
|
Rasool A, Hong J, Hong Z, Li Y, Zou C, Chen H, Qu Q, Wang Y, Jiang Q, Huang X, Dai J. An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. SMALL METHODS 2024; 8:e2301585. [PMID: 38807543 DOI: 10.1002/smtd.202301585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/29/2024] [Indexed: 05/30/2024]
Abstract
DNA-based data storage is a new technology in computational and synthetic biology, that offers a solution for long-term, high-density data archiving. Given the critical importance of medical data in advancing human health, there is a growing interest in developing an effective medical data storage system based on DNA. Data integrity, accuracy, reliability, and efficient retrieval are all significant concerns. Therefore, this study proposes an Effective DNA Storage (EDS) approach for archiving medical MRI data. The EDS approach incorporates three key components (i) a novel fraction strategy to address the critical issue of rotating encoding, which often leads to data loss due to single base error propagation; (ii) a novel rule-based quaternary transcoding method that satisfies bio-constraints and ensure reliable mapping; and (iii) an indexing technique designed to simplify random search and access. The effectiveness of this approach is validated through computer simulations and biological experiments, confirming its practicality. The EDS approach outperforms existing methods, providing superior control over bio-constraints and reducing computational time. The results and code provided in this study open new avenues for practical DNA storage of medical MRI data, offering promising prospects for the future of medical data archiving and retrieval.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jingwei Hong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Zhiling Hong
- Quanzhou Development Group Co., Ltd, Quanzhou, 362000, China
| | - Yuanzhen Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Chao Zou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518055, China
| |
Collapse
|
18
|
Yu M, Tang X, Li Z, Wang W, Wang S, Li M, Yu Q, Xie S, Zuo X, Chen C. High-throughput DNA synthesis for data storage. Chem Soc Rev 2024; 53:4463-4489. [PMID: 38498347 DOI: 10.1039/d3cs00469d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
With the explosion of digital world, the dramatically increasing data volume is expected to reach 175 ZB (1 ZB = 1012 GB) in 2025. Storing such huge global data would consume tons of resources. Fortunately, it has been found that the deoxyribonucleic acid (DNA) molecule is the most compact and durable information storage medium in the world so far. Its high coding density and long-term preservation properties make itself one of the best data storage carriers for the future. High-throughput DNA synthesis is a key technology for "DNA data storage", which encodes binary data stream (0/1) into quaternary long DNA sequences consisting of four bases (A/G/C/T). In this review, the workflow of DNA data storage and the basic methods of artificial DNA synthesis technology are outlined first. Then, the technical characteristics of different synthesis methods and the state-of-the-art of representative commercial companies, with a primary focus on silicon chip microarray-based synthesis and novel enzymatic DNA synthesis are presented. Finally, the recent status of DNA storage and new opportunities for future development in the field of high-throughput, large-scale DNA synthesis technology are summarized.
Collapse
Affiliation(s)
- Meng Yu
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaohui Tang
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Zhenhua Li
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Weidong Wang
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Shaopeng Wang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Min Li
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Qiuliyang Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
| | - Sijia Xie
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Chang Chen
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, China
| |
Collapse
|
19
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
20
|
Li Y, Zhang H, Chen Y, Shen Y, Ping Z. DNA Bloom Filter enables anti-contamination and file version control for DNA-based data storage. Brief Bioinform 2024; 25:bbae125. [PMID: 38555478 PMCID: PMC10981766 DOI: 10.1093/bib/bbae125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/09/2024] [Accepted: 02/27/2024] [Indexed: 04/02/2024] Open
Abstract
DNA storage is one of the most promising ways for future information storage due to its high data storage density, durable storage time and low maintenance cost. However, errors are inevitable during synthesizing, storing and sequencing. Currently, many error correction algorithms have been developed to ensure accurate information retrieval, but they will decrease storage density or increase computing complexity. Here, we apply the Bloom Filter, a space-efficient probabilistic data structure, to DNA storage to achieve the anti-error, or anti-contamination function. This method only needs the original correct DNA sequences (referred to as target sequences) to produce a corresponding data structure, which will filter out almost all the incorrect sequences (referred to as non-target sequences) during sequencing data analysis. Experimental results demonstrate the universal and efficient filtering capabilities of our method. Furthermore, we employ the Counting Bloom Filter to achieve the file version control function, which significantly reduces synthesis costs when modifying DNA-form files. To achieve cost-efficient file version control function, a modified system based on yin-yang codec is developed.
Collapse
Affiliation(s)
- Yiming Li
- BGI Research, Shenzhen, 518083, China
- BGI Research, Changzhou, 213299, China
| | - Haoling Zhang
- BGI Research, Shenzhen, 518083, China
- Living Systems Lab, BESE, CEMSE, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | | | - Yue Shen
- BGI Research, Shenzhen, 518083, China
- BGI Research, Changzhou, 213299, China
| | - Zhi Ping
- BGI Research, Shenzhen, 518083, China
- BGI Research, Changzhou, 213299, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 518172, China
| |
Collapse
|
21
|
Ding L, Wu S, Hou Z, Li A, Xu Y, Feng H, Pan W, Ruan J. Improving error-correcting capability in DNA digital storage via soft-decision decoding. Natl Sci Rev 2024; 11:nwad229. [PMID: 38213525 PMCID: PMC10776348 DOI: 10.1093/nsr/nwad229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/03/2023] [Accepted: 08/15/2023] [Indexed: 01/13/2024] Open
Abstract
Error-correcting codes (ECCs) employed in the state-of-the-art DNA digital storage (DDS) systems suffer from a trade-off between error-correcting capability and the proportion of redundancy. To address this issue, in this study, we introduce soft-decision decoding approach into DDS by proposing a DNA-specific error prediction model and a series of novel strategies. We demonstrate the effectiveness of our approach through a proof-of-concept DDS system based on Reed-Solomon (RS) code, named as Derrick. Derrick shows significant improvement in error-correcting capability without involving additional redundancy in both in vitro and in silico experiments, using various sequencing technologies such as Illumina, PacBio and Oxford Nanopore Technology (ONT). Notably, in vitro experiments using ONT sequencing at a depth of 7× reveal that Derrick, compared with the traditional hard-decision decoding strategy, doubles the error-correcting capability of RS code, decreases the proportion of matrices with decoding-failure by 229-fold, and amplifies the potential maximum storage volume by impressive 32 388-fold. Also, Derrick surpasses 'state-of-the-art' DDS systems by comprehensively considering the information density and the minimum sequencing depth required for complete information recovery. Crucially, the soft-decision decoding strategy and key steps of Derrick are generalizable to other ECCs' decoding algorithms.
Collapse
Affiliation(s)
- Lulu Ding
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Shigang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Zhihao Hou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou510642, China
| | - Alun Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Yaping Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Hu Feng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| |
Collapse
|
22
|
Sabary O, Yucovich A, Shapira G, Yaakobi E. Reconstruction algorithms for DNA-storage systems. Sci Rep 2024; 14:1951. [PMID: 38263421 PMCID: PMC10806084 DOI: 10.1038/s41598-024-51730-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
Motivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm's estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Alexander Yucovich
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Guy Shapira
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
23
|
Sadremomtaz A, Glass RF, Guerrero JE, LaJeunesse DR, Josephs EA, Zadegan R. Digital data storage on DNA tape using CRISPR base editors. Nat Commun 2023; 14:6472. [PMID: 37833288 PMCID: PMC10576057 DOI: 10.1038/s41467-023-42223-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 10/04/2023] [Indexed: 10/15/2023] Open
Abstract
While the archival digital memory industry approaches its physical limits, the demand is significantly increasing, therefore alternatives emerge. Recent efforts have demonstrated DNA's enormous potential as a digital storage medium with superior information durability, capacity, and energy consumption. However, the majority of the proposed systems require on-demand de-novo DNA synthesis techniques that produce a large amount of toxic waste and therefore are not industrially scalable and environmentally friendly. Inspired by the architecture of semiconductor memory devices and recent developments in gene editing, we created a molecular digital data storage system called "DNA Mutational Overwriting Storage" (DMOS) that stores information by leveraging combinatorial, addressable, orthogonal, and independent in vitro CRISPR base-editing reactions to write data on a blank pool of greenly synthesized DNA tapes. As a proof of concept, this work illustrates writing and accurately reading of both a bitmap representation of our school's logo and the title of this study on the DNA tapes.
Collapse
Affiliation(s)
- Afsaneh Sadremomtaz
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, NC A&T State University, Greensboro, NC, USA
| | - Robert F Glass
- Department of Nanoscience, Joint School of Nanoscience and Nanoengineering, UNC Greensboro, Greensboro, NC, USA
| | - Jorge Eduardo Guerrero
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, NC A&T State University, Greensboro, NC, USA
| | - Dennis R LaJeunesse
- Department of Nanoscience, Joint School of Nanoscience and Nanoengineering, UNC Greensboro, Greensboro, NC, USA
| | - Eric A Josephs
- Department of Nanoscience, Joint School of Nanoscience and Nanoengineering, UNC Greensboro, Greensboro, NC, USA.
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, NC A&T State University, Greensboro, NC, USA.
| |
Collapse
|
24
|
Yang X, Lai L, Qiang X, Deng M, Xie Y, Shi X, Kou Z. Towards Chinese text and DNA shift encoding scheme based on biomass plasmid storage. FRONTIERS IN BIOINFORMATICS 2023; 3:1276934. [PMID: 37900965 PMCID: PMC10602677 DOI: 10.3389/fbinf.2023.1276934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 09/28/2023] [Indexed: 10/31/2023] Open
Abstract
DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid's resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Yuhao Xie
- School of Mathematical Science, Inner Mongolia University, Hohhot, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Zheng Kou
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
25
|
Rasool A, Hong J, Jiang Q, Chen H, Qu Q. BO-DNA: Biologically optimized encoding model for a highly-reliable DNA data storage. Comput Biol Med 2023; 165:107404. [PMID: 37666064 DOI: 10.1016/j.compbiomed.2023.107404] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/13/2023] [Accepted: 08/26/2023] [Indexed: 09/06/2023]
Abstract
DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jingwei Hong
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Qingshan Jiang
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, Guangdong, China
| | - Qiang Qu
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
26
|
Zhao Y, Cao B, Wang P, Wang K, Wang B. DBTRG: De Bruijn Trim rotation graph encoding for reliable DNA storage. Comput Struct Biotechnol J 2023; 21:4469-4477. [PMID: 37736298 PMCID: PMC10510065 DOI: 10.1016/j.csbj.2023.09.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023] Open
Abstract
DNA is a high-density, long-term stable, and scalable storage medium that can meet the increased demands on storage media resulting from the exponential growth of data. The existing DNA storage encoding schemes tend to achieve high-density storage but do not fully consider the local and global stability of DNA sequences and the read and write accuracy of the stored information. To address these problems, this article presents a graph-based De Bruijn Trim Rotation Graph (DBTRG) encoding scheme. Through XOR between the proposed dynamic binary sequence and the original binary sequence, k-mers can be divided into the De Bruijn Trim graph, and the stored information can be compressed according to the overlapping relationship. The simulated experimental results show that DBTRG ensures base balance and diversity, reduces the likelihood of undesired motifs, and improves the stability of DNA storage and data recovery. Furthermore, the maintenance of an encoding rate of 1.92 while storing 510 KB images and the introduction of novel approaches and concepts for DNA storage encoding methods are achieved.
Collapse
Affiliation(s)
- Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Penghao Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, Liaoning 116622, China
| |
Collapse
|
27
|
Park SJ, Kim S, Jeong J, No A, No JS, Park H. Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads. Bioinformatics 2023; 39:btad548. [PMID: 37669160 PMCID: PMC10500082 DOI: 10.1093/bioinformatics/btad548] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.
Collapse
Affiliation(s)
- Seong-Joon Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Sunghwan Kim
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, South Korea
| | - Jaeho Jeong
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Albert No
- Department of Electronic and Electrical Engineering, Hongik University, Seoul 04066, South Korea
| | - Jong-Seon No
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Hosung Park
- Department of Computer Engineering, Chonnam National University, Gwangju 61186, South Korea
- Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 61186, South Korea
| |
Collapse
|
28
|
Xu C, Ma B, Dong X, Lei L, Hao Q, Zhao C, Liu H. Assembly of Reusable DNA Blocks for Data Storage Using the Principle of Movable Type Printing. ACS APPLIED MATERIALS & INTERFACES 2023; 15:24097-24108. [PMID: 37184884 DOI: 10.1021/acsami.3c01860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Due to its high coding density and longevity, DNA is a compelling data storage alternative. However, current DNA data storage systems rely on the de novo synthesis of enormous DNA molecules, resulting in low data editability, high synthesis costs, and restrictions on further applications. Here, we demonstrate the programmable assembly of reusable DNA blocks for versatile data storage using the ancient movable type printing principle. Digital data are first encoded into nucleotide sequences in DNA hairpins, which are then synthesized and immobilized on solid beads as modular DNA blocks. Using DNA polymerase-catalyzed primer exchange reaction, data can be continuously replicated from hairpins on DNA blocks and attached to a primer in tandem to produce new information. The assembly of DNA blocks is highly programmable, producing various data by reusing a finite number of DNA blocks and reducing synthesis costs (∼1718 versus 3000 to 30,000 US$ per megabyte using conventional methods). We demonstrate the flexible assembly of texts, images, and random numbers using DNA blocks and the integration with DNA logic circuits to manipulate data synthesis. This work suggests a flexible paradigm by recombining already synthesized DNA to build cost-effective and intelligent DNA data storage systems.
Collapse
Affiliation(s)
- Chengtao Xu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Biao Ma
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Xing Dong
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Lanjie Lei
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Qing Hao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Chao Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| | - Hong Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University Institution, 2# Sipailou, Nanjing, Jiangsu 210096, China
| |
Collapse
|
29
|
Zan X, Chu L, Xie R, Su Y, Yao X, Xu P, Liu W. An image cryptography method by highly error-prone DNA storage channel. Front Bioeng Biotechnol 2023; 11:1173763. [PMID: 37152655 PMCID: PMC10154519 DOI: 10.3389/fbioe.2023.1173763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 03/30/2023] [Indexed: 05/09/2023] Open
Abstract
Introduction: Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in the DNA storage system is still an unsolved problem. Methods: In this article, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt images in highly error-prone DNA storage channels. Results and Discussion: Numerical results have demonstrated that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise, and data loss). When compared with other methods such as the hybridization reactions of DNA molecules, the proposed method is more reliable and feasible for large-scale applications.
Collapse
Affiliation(s)
- Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ling Chu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ranze Xie
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Yanqing Su
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangyu Yao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China
| |
Collapse
|
30
|
Xie R, Zan X, Chu L, Su Y, Xu P, Liu W. Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage. BMC Bioinformatics 2023; 24:111. [PMID: 36959531 PMCID: PMC10037887 DOI: 10.1186/s12859-023-05237-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/17/2023] [Indexed: 03/25/2023] Open
Abstract
Synchronization (insertions-deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
Collapse
Affiliation(s)
- Ranze Xie
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiangzhen Zan
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Ling Chu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Yanqing Su
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Peng Xu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Wenbin Liu
- Institution of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
31
|
Rasool A, Jiang Q, Wang Y, Huang X, Qu Q, Dai J. Evolutionary approach to construct robust codes for DNA-based data storage. Front Genet 2023; 14:1158337. [PMID: 37021008 PMCID: PMC10067891 DOI: 10.3389/fgene.2023.1158337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/02/2023] [Indexed: 04/07/2023] Open
Abstract
DNA is a practical storage medium with high density, durability, and capacity to accommodate exponentially growing data volumes. A DNA sequence structure is a biocomputing problem that requires satisfying bioconstraints to design robust sequences. Existing evolutionary approaches to DNA sequences result in errors during the encoding process that reduces the lower bounds of DNA coding sets used for molecular hybridization. Additionally, the disordered DNA strand forms a secondary structure, which is susceptible to errors during decoding. This paper proposes a computational evolutionary approach based on a synergistic moth-flame optimizer by Levy flight and opposition-based learning mutation strategies to optimize these problems by constructing reverse-complement constraints. The MFOS aims to attain optimal global solutions with robust convergence and balanced search capabilities to improve DNA code lower bounds and coding rates for DNA storage. The ability of the MFOS to construct DNA coding sets is demonstrated through various experiments that use 19 state-of-the-art functions. Compared with the existing studies, the proposed approach with three different bioconstraints substantially improves the lower bounds of the DNA codes by 12-28% and significantly reduces errors.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- *Correspondence: Qingshan Jiang,
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|