1
|
Ma Y, Chen S, Xu Q, Lu Z, Bi K. High-Risk Sequence Prediction Model in DNA Storage: The LQSF Method. IEEE Trans Nanobioscience 2025; 24:89-101. [PMID: 38976468 DOI: 10.1109/tnb.2024.3424576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Traditional DNA storage technologies rely on passive filtering methods for error correction during synthesis and sequencing, which result in redundancy and inadequate error correction. Addressing this, the Low Quality Sequence Filter (LQSF) was introduced, an innovative method employing deep learning models to predict high-risk sequences. The LQSF approach leverages a classification model trained on error-prone sequences, enabling efficient pre-sequencing filtration of low-quality sequences and reducing time and resources in subsequent stages. Analysis has demonstrated a clear distinction between high and low-quality sequences, confirming the efficacy of the LQSF method. Extensive training and testing were conducted across various neural networks and test sets. The results showed all models achieving an AUC value above 0.91 on ROC curves and over 0.95 on PR curves across different datasets. Notably, models such as Alexnet, VGG16, and VGG19 achieved a perfect AUC of 1.0 on the Original dataset, highlighting their precision in classification. Further validation using Illumina sequencing data substantiated a strong correlation between model scores and sequence error-proneness, emphasizing the model's applicability. The LQSF method marks a significant advancement in DNA storage technology, introducing active sequence filtering at the encoding stage. This pioneering approach holds substantial promise for future DNA storage research and applications.
Collapse
|
2
|
Cao B, Wang K, Xie L, Zhang J, Zhao Y, Wang B, Zheng P. PELMI: Realize robust DNA image storage under general errors via parity encoding and local mean iteration. Brief Bioinform 2024; 25:bbae463. [PMID: 39288232 PMCID: PMC11407442 DOI: 10.1093/bib/bbae463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/01/2024] [Accepted: 09/04/2024] [Indexed: 09/19/2024] Open
Abstract
DNA molecules as storage media are characterized by high encoding density and low energy consumption, making DNA storage a highly promising storage method. However, DNA storage has shortcomings, especially when storing multimedia data, wherein image reconstruction fails when address errors occur, resulting in complete data loss. Therefore, we propose a parity encoding and local mean iteration (PELMI) scheme to achieve robust DNA storage of images. The proposed parity encoding scheme satisfies the common biochemical constraints of DNA sequences and the undesired motif content. It addresses varying pixel weights at different positions for binary data, thus optimizing the utilization of Reed-Solomon error correction. Then, through lost and erroneous sequences, data supplementation and local mean iteration are employed to enhance the robustness. The encoding results show that the undesired motif content is reduced by 23%-50% compared with the representative schemes, which improves the sequence stability. PELMI achieves image reconstruction under general errors (insertion, deletion, substitution) and enhances the DNA sequences quality. Especially under 1% error, compared with other advanced encoding schemes, the peak signal-to-noise ratio and the multiscale structure similarity address metric were increased by 10%-13% and 46.8%-122%, respectively, and the mean squared error decreased by 113%-127%. This demonstrates that the reconstructed images had better clarity, fidelity, and similarity in structure, texture, and detail. In summary, PELMI ensures robustness and stability of image storage in DNA and achieves relatively high-quality image reconstruction under general errors.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning 116024, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Lei Xie
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, No. 90, East Hualan Avenue, Hongqi District, Xinxiang, Henan 451191, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Upper Riccarton, Christchurch 8140, New Zealand
| |
Collapse
|
3
|
Wang K, Cao B, Ma T, Zhao Y, Zheng Y, Wang B, Zhou S, Zhang Q. Storing Images in DNA via base128 Encoding. J Chem Inf Model 2024; 64:1719-1729. [PMID: 38385334 DOI: 10.1021/acs.jcim.3c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Current DNA storage schemes lack flexibility and consistency in processing highly redundant and correlated image data, resulting in low sequence stability and image reconstruction rates. Therefore, according to the characteristics of image storage, this paper proposes storing images in DNA via base128 encoding (DNA-base128). In the data writing stage, data segmentation and probability statistics are carried out, and then, the data block frequency and constraint encoding set are associated with achieving encoding. When the image needs to be recovered, DNA-base128 completes internal error correction by threshold setting and drift comparison. Compared with representative work, the DNA-base128 encoding results show that the undesired motifs were reduced by 71.2-90.7% and that the local guanine-cytosine content variance was reduced by 3 times, indicating that DNA-base128 can store images more stably. In addition, the structural similarity index (SSIM) and multiscale structural similarity (MS-SSIM) of image reconstruction using DNA-base128 were improved by 19-102 and 6.6-20.3%, respectively. In summary, DNA-base128 provides image encoding with internal error correction and provides a potential solution for DNA image storage. The data and code are available at the GitHub repository: https://github.com/123456wk/DNA_base128.
Collapse
Affiliation(s)
- Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Ma
- Brain Function Research Section, China Medical University, Shenyang 110001, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
4
|
Mu Z, Cao B, Wang P, Wang B, Zhang Q. RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. IEEE Trans Nanobioscience 2023; 22:912-922. [PMID: 37028365 DOI: 10.1109/tnb.2023.3254514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage.
Collapse
|
5
|
Park SJ, Kim S, Jeong J, No A, No JS, Park H. Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads. Bioinformatics 2023; 39:btad548. [PMID: 37669160 PMCID: PMC10500082 DOI: 10.1093/bioinformatics/btad548] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.
Collapse
Affiliation(s)
- Seong-Joon Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Sunghwan Kim
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, South Korea
| | - Jaeho Jeong
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Albert No
- Department of Electronic and Electrical Engineering, Hongik University, Seoul 04066, South Korea
| | - Jong-Seon No
- Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Hosung Park
- Department of Computer Engineering, Chonnam National University, Gwangju 61186, South Korea
- Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 61186, South Korea
| |
Collapse
|
6
|
Yang X, Shi X, Lai L, Chen C, Xu H, Deng M. Towards long double-stranded chains and robust DNA-based data storage using the random code system. Front Genet 2023; 14:1179867. [PMID: 37384333 PMCID: PMC10294226 DOI: 10.3389/fgene.2023.1179867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 05/31/2023] [Indexed: 06/30/2023] Open
Abstract
DNA has become a popular choice for next-generation storage media due to its high storage density and stability. As the storage medium of life's information, DNA has significant storage capacity and low-cost, low-power replication and transcription capabilities. However, utilizing long double-stranded DNA for storage can introduce unstable factors that make it difficult to meet the constraints of biological systems. To address this challenge, we have designed a highly robust coding scheme called the "random code system," inspired by the idea of fountain codes. The random code system includes the establishment of a random matrix, Gaussian preprocessing, and random equilibrium. Compared to Luby transform codes (LT codes), random code (RC) has better robustness and recovery ability of lost information. In biological experiments, we successfully stored 29,390 bits of data in 25,700 bp chains, achieving a storage density of 1.78 bits per nucleotide. These results demonstrate the potential for using long double-stranded DNA and the random code system for robust DNA-based data storage.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Congzhou Chen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Huaisheng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
7
|
Zeng C, Liu X, Wang B, Qin R, Zhang Q. Multifunctional Exo III-assisted scalability strategy for constructing DNA molecular logic circuits. Analyst 2023; 148:1954-1960. [PMID: 36994799 DOI: 10.1039/d3an00086a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
The construction of logic circuits is critical to DNA computing. Simple and effective scalability methods have been the focus of attention in various fields related to constructing logic circuits. We propose a double-stranded separation (DSS) strategy to facilitate the construction of complex circuits. The strategy combines toehold-mediated strand displacement with exonuclease III (Exo III), which is a multifunctional nuclease. Exo III can quickly recognize an apurinic/apyrimidinic (AP) site. DNA oligos with an AP site can generate an output signal by the strand displacement reaction. However, in contrast to traditional strand displacement reactions, the double-stranded waste from the strand displacement can be further hydrolysed by the endonuclease function of Exo III, thus generating an additional output signal. The DSS strategy allows for the effective scalability of molecular logic circuits, enabling multiple logic computing capabilities simultaneously. In addition, we succeeded in constructing a logic circuit with dual logic functions that provides foundations for more complex circuits in the future and has a broad scope for development in logic computing, biosensing, and nanomachines.
Collapse
Affiliation(s)
- Chenyi Zeng
- Key Laboratory of Advanced Design and Intelligent Computing, School of Software Engineering, Dalian University, Dalian 116622, China.
| | - Xin Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, School of Software Engineering, Dalian University, Dalian 116622, China.
| | - Rui Qin
- Key Laboratory of Advanced Design and Intelligent Computing, School of Software Engineering, Dalian University, Dalian 116622, China.
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, School of Software Engineering, Dalian University, Dalian 116622, China.
| |
Collapse
|
8
|
Liu Y, Wang J, Sun L, Wang B, Zhang Q, Zhang X, Cao B. Active Self-Assembly of Ladder-Shaped DNA Carrier for Drug Delivery. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28020797. [PMID: 36677855 PMCID: PMC9862081 DOI: 10.3390/molecules28020797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/07/2023] [Accepted: 01/10/2023] [Indexed: 01/15/2023]
Abstract
With the advent of nanotechnology, DNA molecules have been transformed from solely genetic information carriers to multifunctional materials, showing a tremendous potential for drug delivery and disease diagnosis. In drug delivery systems, DNA is used as a building material to construct drug carriers through a variety of DNA self-assembly methods, which can integrate multiple functions to complete in vivo and in situ tasks. In this study, ladder-shaped drug carriers are developed for drug delivery on the basis of a DNA nanoladder. We first demonstrate the overall structure of the nanoladder, in which a nick is added into each rung of the nanoladder to endow the nanoladder with the ability to incorporate a drug loading site. The structure is designed to counteract the decrement of stability caused by the nick and investigated in different conditions to gain insight into the properties of the nicked DNA nanoladders. As a proof of concept, we fix the biotin in every other nick as a loading site and assemble the protein (streptavidin) on the loading site to demonstrate the feasibility of the drug-carrying function. The protein can be fixed stably and can be extended to different biological and chemical drugs by altering the drug loading site. We believe this design approach will be a novel addition to the toolbox of DNA nanotechnology, and it will be useful for versatile applications such as in bioimaging, biosensing, and targeted therapy.
Collapse
Affiliation(s)
- Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jiaxin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
| | - Lijun Sun
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
- Correspondence:
| | - Xiaokang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
9
|
Rasool A, Jiang Q, Wang Y, Huang X, Qu Q, Dai J. Evolutionary approach to construct robust codes for DNA-based data storage. Front Genet 2023; 14:1158337. [PMID: 37021008 PMCID: PMC10067891 DOI: 10.3389/fgene.2023.1158337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/02/2023] [Indexed: 04/07/2023] Open
Abstract
DNA is a practical storage medium with high density, durability, and capacity to accommodate exponentially growing data volumes. A DNA sequence structure is a biocomputing problem that requires satisfying bioconstraints to design robust sequences. Existing evolutionary approaches to DNA sequences result in errors during the encoding process that reduces the lower bounds of DNA coding sets used for molecular hybridization. Additionally, the disordered DNA strand forms a secondary structure, which is susceptible to errors during decoding. This paper proposes a computational evolutionary approach based on a synergistic moth-flame optimizer by Levy flight and opposition-based learning mutation strategies to optimize these problems by constructing reverse-complement constraints. The MFOS aims to attain optimal global solutions with robust convergence and balanced search capabilities to improve DNA code lower bounds and coding rates for DNA storage. The ability of the MFOS to construct DNA coding sets is demonstrated through various experiments that use 19 state-of-the-art functions. Compared with the existing studies, the proposed approach with three different bioconstraints substantially improves the lower bounds of the DNA codes by 12-28% and significantly reduces errors.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- *Correspondence: Qingshan Jiang,
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
10
|
Qi M, Shi P, Zhang X, Cui S, Liu Y, Zhou S, Zhang Q. Reconfigurable DNA triplex structure for pH responsive logic gates †. RSC Adv 2023; 13:9864-9870. [PMID: 36998523 PMCID: PMC10043996 DOI: 10.1039/d3ra00536d] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
The DNA triplex is a special DNA structure often used as a logic gate substrate due to its high stability, programmability, and pH responsiveness. However, multiple triplex structures with different C−G−C+ proportions must be introduced into existing triplex logic gates due to the numerous logic calculations involved. This requirement complicates circuit design and results in many reaction by-products, greatly restricting the construction of large-scale logic circuits. Thus, we designed a new reconfigurable DNA triplex structure (RDTS) and constructed the pH-responsive logic gates through its conformational change that uses two types of logic calculations, ‘AND’ and ‘OR’. The use of these logic calculations necessitates fewer substrates, further enhancing the extensibility of the logic circuit. This result is expected to promote the development of the triplex in molecular computing and facilitate the completion of large-scale computing networks. We constructed pH-responsive logic gates through substrate conformational change that uses two types of logic calculations, ‘AND’ and ‘OR’. Our logic gates necessitate fewer substrates when two types of logic calculations are needed.![]()
Collapse
Affiliation(s)
- Mingxuan Qi
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian UniversityDalian 116622China
| | - Peijun Shi
- School of Computer Science and Technology, Dalian University of TechnologyDalian 116024China
| | - Xiaokang Zhang
- School of Computer Science and Technology, Dalian University of TechnologyDalian 116024China
| | - Shuang Cui
- School of Computer Science and Technology, Dalian University of TechnologyDalian 116024China
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of TechnologyDalian 116024China
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian UniversityDalian 116622China
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian UniversityDalian 116622China
| |
Collapse
|