1
|
Xie L, Cao B, Wen X, Zheng Y, Wang B, Zhou S, Zheng P. ReLume: Enhancing DNA storage data reconstruction with flow network and graph partitioning. Methods 2025; 240:101-112. [PMID: 40268154 DOI: 10.1016/j.ymeth.2025.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 03/06/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
DNA storage is an ideal alternative to silicon-based storage, but focusing on data writing alone cannot address the inevitable errors and durability issues. Therefore, we propose ReLume, a DNA storage data reconstruction method based on flow networks and graph partitioning technology, which can accomplish the data reconstruction task of millions of reads on a laptop with 24 GB RAM. The results show that ReLume copes well with many types of errors, more than doubles sequence recovery rates, and reduces memory usage by about 60 %. ReLume is 10 times more durable than other representative methods, meaning that data can be read without loss after 100 years. Results from the wet lab DNA storage dataset show that ReLume's sequence recovery rates of 73 % and 93.2 %, respectively, significantly outperform existing methods. In summary, ReLume effectively overcomes the accuracy and hardware limitations and provides a feasible idea for the portability of DNA storage.
Collapse
Affiliation(s)
- Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Xiaoru Wen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, 8140 Christchurch, New Zealand
| |
Collapse
|
2
|
Zhang X, Lu Y. "Galaxy" Encoding: Toward High Storage Density and Low Cost. IEEE Trans Nanobioscience 2025; 24:200-207. [PMID: 39466861 DOI: 10.1109/tnb.2024.3481504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
DNA is considered one of the most attractive storage media because of its excellent reliability and durability. Early encoding schemes lacked flexibility and scalability. To address these limitations, we propose a combination of static mapping and dynamic encoding, named "Galaxy" encoding. This scheme uses both the "dual-rule interleaving" algorithm and the "twelve-element Huffman rotational encoding" algorithm. We tested it with "Shakespeare Sonnets" and other files, achieving an encoding information density of approximately 2.563 bits/nt. Additionally, the inclusion of Reed-Solomon error-correcting codes can correct nearly 5% of the errors. Our simulations show that it supports various file types (.gz, .tar, .exe, etc.). We also analyzed the cost and fault tolerance of "Galaxy" encoding, demonstrating its high coding efficiency and ability to fully recover original information while effectively reducing the costs of DNA synthesis and sequencing.
Collapse
|
3
|
Xu X, Wang W, Ping Z. Biotechnological tools boost the functional diversity of DNA-based data storage systems. Comput Struct Biotechnol J 2025; 27:624-630. [PMID: 40027441 PMCID: PMC11869497 DOI: 10.1016/j.csbj.2025.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 02/05/2025] [Accepted: 02/05/2025] [Indexed: 03/05/2025] Open
Abstract
DNA-based data storage has emerged as a groundbreaking solution to the growing demand for efficient, high-density, and long-term data storage. It is attracting many researchers' attention, who are implementing functions such as random access, searching, and data operations apart from the existing capabilities, including reading and writing. We summarize the recent progress of how biotechnological tools, based on sequence specificity, encapsulation, and high-dimensional structures of DNA molecules, facilitate the implementation of various functions. The limitations of using biochemical reactions that hinder the development of more precise and efficient information storage systems are also discussed. Future advancements in molecular biology and nanotechnology are expected to improve the architecture, scalability, and efficiency of DNA storage, positioning it as a sustainable and dynamic alternative to conventional data storage systems.
Collapse
Affiliation(s)
- Xiaoyuan Xu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wen Wang
- BGI Research, Beijing 100101, China
- BGI Research, Shenzhen 518083, China
| | - Zhi Ping
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- BGI Research, Beijing 100101, China
- BGI Research, Shenzhen 518083, China
| |
Collapse
|
4
|
Gao C, Bao W, Wang S, Zheng J, Wang L, Ren Y, Jiao L, Wang J, Wang X. DockingGA: enhancing targeted molecule generation using transformer neural network and genetic algorithm with docking simulation. Brief Funct Genomics 2024; 23:595-606. [PMID: 38582610 DOI: 10.1093/bfgp/elae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/25/2024] [Accepted: 03/13/2024] [Indexed: 04/08/2024] Open
Abstract
Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.
Collapse
Affiliation(s)
- Changnan Gao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Wenjie Bao
- Guanghua School of Management, Peking University, Beijing 100091, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Jianyang Zheng
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Lulu Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Yongqi Ren
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Republic of Korea
| | - Xun Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
- High Performance Computer Research Center, Institute of Computing Technology, CAS, Beijing 100190, China
| |
Collapse
|
5
|
Zhang J. Levy Sooty Tern Optimization Algorithm Builds DNA Storage Coding Sets for Random Access. ENTROPY (BASEL, SWITZERLAND) 2024; 26:778. [PMID: 39330111 PMCID: PMC11431215 DOI: 10.3390/e26090778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/02/2024] [Accepted: 09/05/2024] [Indexed: 09/28/2024]
Abstract
DNA molecules, as a storage medium, possess unique advantages. Not only does DNA storage exhibit significantly higher storage density compared to electromagnetic storage media, but it also features low energy consumption and extremely long storage times. However, the integration of DNA storage into daily life remains distant due to challenges such as low storage density, high latency, and inevitable errors during the storage process. Therefore, this paper proposes constructing a DNA storage coding set based on the Levy Sooty Tern Optimization Algorithm (LSTOA) to achieve an efficient random-access DNA storage system. Firstly, addressing the slow iteration speed and susceptibility to local optima of the Sooty Tern Optimization Algorithm (STOA), this paper introduces Levy flight operations and propose the LSTOA. Secondly, utilizing the LSTOA, this paper constructs a DNA storage encoding set to facilitate random access while meeting combinatorial constraints. To demonstrate the coding performance of the LSTOA, this paper consists of analyses on 13 benchmark test functions, showcasing its superior performance. Furthermore, under the same combinatorial constraints, the LSTOA constructs larger DNA storage coding sets, effectively reducing the read-write latency and error rate of DNA storage.
Collapse
Affiliation(s)
- Jianxia Zhang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453003, China
- School of Intelligent Engineering, Henan Institute of Technology, Xinxiang 453003, China
| |
Collapse
|
6
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
7
|
Zhang X, Qi B, Niu Y. A dual-rule encoding DNA storage system using chaotic mapping to control GC content. Bioinformatics 2024; 40:btae113. [PMID: 38419588 PMCID: PMC10937898 DOI: 10.1093/bioinformatics/btae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/21/2024] [Accepted: 02/26/2024] [Indexed: 03/02/2024] Open
Abstract
MOTIVATION DNA as a novel storage medium is considered an effective solution to the world's growing demand for information due to its high density and long-lasting reliability. However, early coding schemes ignored the biologically constrained nature of DNA sequences in pursuit of high density, leading to DNA synthesis and sequencing difficulties. This article proposes a novel DNA storage coding scheme. The system encodes half of the binary data using each of the two GC-content complementary encoding rules to obtain a DNA sequence. RESULTS After simulating the encoding of representative document and image file formats, a DNA sequence strictly conforming to biological constraints was obtained, reaching a coding potential of 1.66 bit/nt. In the decoding process, a mechanism to prevent error propagation was introduced. The simulation results demonstrate that by adding Reed-Solomon code, 90% of the data can still be recovered after introducing a 2% error, proving that the proposed DNA storage scheme has high robustness and reliability. Availability and implementation: The source code for the codec scheme of this paper is available at https://github.com/Mooreniah/DNA-dual-rule-rotary-encoding-storage-system-DRRC.
Collapse
Affiliation(s)
- Xuncai Zhang
- College of Electrical Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| | - Baonan Qi
- College of Electrical Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| | - Ying Niu
- College of Building Environment Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| |
Collapse
|
8
|
Jeong J, Park H, Kwak HY, No JS, Jeon H, Lee JW, Kim JW. Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding. IEEE Trans Nanobioscience 2024; 23:81-90. [PMID: 37294652 DOI: 10.1109/tnb.2023.3284406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3% ∼ 7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.
Collapse
|
9
|
Rasool A, Hong J, Jiang Q, Chen H, Qu Q. BO-DNA: Biologically optimized encoding model for a highly-reliable DNA data storage. Comput Biol Med 2023; 165:107404. [PMID: 37666064 DOI: 10.1016/j.compbiomed.2023.107404] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/13/2023] [Accepted: 08/26/2023] [Indexed: 09/06/2023]
Abstract
DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jingwei Hong
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Qingshan Jiang
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, Guangdong, China
| | - Qiang Qu
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
10
|
Mu Z, Cao B, Wang P, Wang B, Zhang Q. RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. IEEE Trans Nanobioscience 2023; 22:912-922. [PMID: 37028365 DOI: 10.1109/tnb.2023.3254514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage.
Collapse
|
11
|
Zheng Y, Cao B, Wu J, Wang B, Zhang Q. High Net Information Density DNA Data Storage by the MOPE Encoding Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2992-3000. [PMID: 37015121 DOI: 10.1109/tcbb.2023.3263521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
DNA has recently been recognized as an attractive storage medium due to its high reliability, capacity, and durability. However, encoding algorithms that simply map binary data to DNA sequences have the disadvantages of low net information density and high synthesis cost. Therefore, this paper proposes an efficient, feasible, and highly robust encoding algorithm called MOPE (Modified Barnacles Mating Optimizer and Payload Encoding). The Modified Barnacles Mating Optimizer (MBMO) algorithm is used to construct the non-payload coding set, and the Payload Encoding (PE) algorithm is used to encode the payload. The results show that the lower bound of the non-payload coding set constructed by the MBMO algorithm is 3%-18% higher than the optimal result of previous work, and theoretical analysis shows that the designed PE algorithm has a net information density of 1.90 bits/nt, which is close to the ideal information capacity of 2 bits per nucleotide. The proposed MOPE encoding algorithm with high net information density and satisfying constraints can not only effectively reduce the cost of DNA synthesis and sequencing but also reduce the occurrence of errors during DNA storage.
Collapse
|
12
|
Mortuza GM, Guerrero J, Llewellyn S, Tobiason MD, Dickinson GD, Hughes WL, Zadegan R, Andersen T. In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA). BMC Bioinformatics 2023; 24:160. [PMID: 37085766 PMCID: PMC10120115 DOI: 10.1186/s12859-023-05264-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/30/2023] [Indexed: 04/23/2023] Open
Abstract
Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments.
Collapse
Affiliation(s)
- Golam Md Mortuza
- Department of Computer Science, Boise State University, Boise, Idaho USA
| | - Jorge Guerrero
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC USA
| | | | | | | | - William L. Hughes
- School of Engineering, Kelowna, University of British Columbia, Kelowna, British Columbia Canada
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC USA
| | - Tim Andersen
- Department of Computer Science, Boise State University, Boise, Idaho USA
| |
Collapse
|
13
|
Du H, Zhou S, Yan W, Wang S. Study on DNA Storage Encoding Based IAOA under Innovation Constraints. Curr Issues Mol Biol 2023; 45:3573-3590. [PMID: 37185757 PMCID: PMC10136724 DOI: 10.3390/cimb45040233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 04/09/2023] [Accepted: 04/13/2023] [Indexed: 05/17/2023] Open
Abstract
With the informationization of social processes, the amount of related data has greatly increased, making traditional storage media unable to meet the current requirements for data storage. Due to its advantages of a high storage capacity and persistence, deoxyribonucleic acid (DNA) has been considered the most prospective storage media to solve the data storage problem. Synthesis is an important process for DNA storage, and low-quality DNA coding can increase errors during sequencing, which can affect the storage efficiency. To reduce errors caused by the poor stability of DNA sequences during storage, this paper proposes a method that uses the double-matching and error-pairing constraints to improve the quality of the DNA coding set. First, the double-matching and error-pairing constraints are defined to solve problems of sequences with self-complementary reactions in the solution that are prone to mismatch at the 3' end. In addition, two strategies are introduced in the arithmetic optimization algorithm, including a random perturbation of the elementary function and a double adaptive weighting strategy. An improved arithmetic optimization algorithm (IAOA) is proposed to construct DNA coding sets. The experimental results of the IAOA on 13 benchmark functions show a significant improvement in its exploration and development capabilities over the existing algorithms. Moreover, the IAOA is used in the DNA encoding design under both traditional and new constraints. The DNA coding sets are tested to estimate their quality regarding the number of hairpins and melting temperature. The DNA storage coding sets constructed in this study are improved by 77.7% at the lower boundary compared to existing algorithms. The DNA sequences in the storage sets show a reduction of 9.7-84.1% in the melting temperature variance, and the hairpin structure ratio is reduced by 2.1-80%. The results indicate that the stability of the DNA coding sets is improved under the two proposed constraints compared to traditional constraints.
Collapse
Affiliation(s)
- Haigui Du
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - WeiQi Yan
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| | - Sijie Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
14
|
Cao B, Wang B, Zhang Q. GCNSA: DNA storage encoding with a graph convolutional network and self-attention. iScience 2023; 26:106231. [PMID: 36876131 PMCID: PMC9982308 DOI: 10.1016/j.isci.2023.106231] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/31/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open
Abstract
DNA Encoding, as a key step in DNA storage, plays an important role in reading and writing accuracy and the storage error rate. However, currently, the encoding efficiency is not high enough and the encoding speed is not fast enough, which limits the performance of DNA storage systems. In this work, a DNA storage encoding system with a graph convolutional network and self-attention (GCNSA) is proposed. The experimental results show that DNA storage code constructed by GCNSA increases by 14.4% on average under the basic constraints, and by 5%-40% under other constraints. The increase of DNA storage codes effectively improves the storage density of 0.7-2.2% in the DNA storage system. The GCNSA predicted more DNA storage codes in less time while ensuring the quality of codes, which lays a foundation for higher read and write efficiency in DNA storage.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
15
|
DNA Sequence Optimization Design of Arithmetic Optimization Algorithm Based on Billiard Hitting Strategy. Interdiscip Sci 2023; 15:231-248. [PMID: 36922455 DOI: 10.1007/s12539-023-00559-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 02/26/2023] [Accepted: 02/27/2023] [Indexed: 03/17/2023]
Abstract
DNA computing is a very efficient way to calculate, but it relies on high-quality DNA sequences, but it is difficult to design high-quality DNA sequences. The sequence it is looking for must meet multiple conflicting constraints at the same time to meet the requirements of DNA calculation. Therefore, we propose an improved arithmetic optimization algorithm of billiard algorithm to optimize the DNA sequence. This paper contributes as follows. The introduction to the good point set initialization to obtain high-quality solutions improves the optimization efficiency. The billiard hitting strategy was used to change the position of the population to enhance the global search scope. The use of a stochastic lens opposites learning mechanism can increase the capacity of the algorithm to get rid of locally optimal. The harmonic search algorithm is introduced to clarify some unqualified secondary structures and improve the quality of the solution. 12 benchmark functions and six other algorithms are used for comparison and ablation experiments to ensure the effectiveness of the algorithms. Finally, the DNA sequences we designed are of higher quality compared to other advanced algorithms.
Collapse
|
16
|
Rasool A, Jiang Q, Wang Y, Huang X, Qu Q, Dai J. Evolutionary approach to construct robust codes for DNA-based data storage. Front Genet 2023; 14:1158337. [PMID: 37021008 PMCID: PMC10067891 DOI: 10.3389/fgene.2023.1158337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/02/2023] [Indexed: 04/07/2023] Open
Abstract
DNA is a practical storage medium with high density, durability, and capacity to accommodate exponentially growing data volumes. A DNA sequence structure is a biocomputing problem that requires satisfying bioconstraints to design robust sequences. Existing evolutionary approaches to DNA sequences result in errors during the encoding process that reduces the lower bounds of DNA coding sets used for molecular hybridization. Additionally, the disordered DNA strand forms a secondary structure, which is susceptible to errors during decoding. This paper proposes a computational evolutionary approach based on a synergistic moth-flame optimizer by Levy flight and opposition-based learning mutation strategies to optimize these problems by constructing reverse-complement constraints. The MFOS aims to attain optimal global solutions with robust convergence and balanced search capabilities to improve DNA code lower bounds and coding rates for DNA storage. The ability of the MFOS to construct DNA coding sets is demonstrated through various experiments that use 19 state-of-the-art functions. Compared with the existing studies, the proposed approach with three different bioconstraints substantially improves the lower bounds of the DNA codes by 12-28% and significantly reduces errors.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- *Correspondence: Qingshan Jiang,
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
17
|
Zhang J. Levy Equilibrium Optimizer algorithm for the DNA storage code set. PLoS One 2022; 17:e0277139. [PMID: 36395269 PMCID: PMC9671426 DOI: 10.1371/journal.pone.0277139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
The generation of massive data puts forward higher requirements for storage technology. DNA storage is a new storage technology which uses biological macromolecule DNA as information carrier. Compared with traditional silicon-based storage, DNA storage has the advantages of large capacity, high density, low energy consumption and high durability. DNA coding is to store data information with as few base sequences as possible without errors. Coding is a key technology in DNA storage, and its results directly affect the performance of storage and the integrity of data reading and writing. In this paper, a Levy Equilibrium Optimizer (LEO) algorithm is proposed to construct a DNA storage code set that satisfies combinatorial constraints. The performance of the proposed algorithm is tested on 13 benchmark functions, and 4 new global optima are obtained. Under the same constraints, the DNA storage code set is constructed. Compared with previous work, the lower bound of DNA storage code set is improved by 4-13%.
Collapse
Affiliation(s)
- Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, China
- * E-mail:
| |
Collapse
|
18
|
Yin Q, Zheng Y, Wang B, Zhang Q. Design of Constraint Coding Sets for Archive DNA Storage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3384-3394. [PMID: 34762590 DOI: 10.1109/tcbb.2021.3127271] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the advent of the era of massive data, the increase of storage demand has far exceeded current storage capacity. DNA molecules provide a reliable solution for big data storage by virtue of their large capacity, high density, and long-term stability. To reduce errors in storing procedures, constructing a sufficient set of constraint encoding is critical for achieving DNA storage. A new version of the Marine Predator algorithm (called QRSS-MPA) is proposed in this paper to increase the lower bound of the coding set while satisfying the specific combination of constraints. In order to demonstrate the effectiveness of the improvement, the classical CEC-05 test function is used to test and compare the mean, variance, scalability, and significance. In terms of storage, the lower bound of construction is compared with previous works, and the result is found to be significantly improved. In order to prevent the emergence of a secondary structure that leads to sequencing failure, we give a more stringent lower bound for the constraint coding set, which is of great significance for reducing the error rate of DNA storage amidst its rapid development.
Collapse
|
19
|
Wang S, Zhou S, Yan W. An enhanced whale optimization algorithm for DNA storage encoding. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:14142-14172. [PMID: 36654084 DOI: 10.3934/mbe.2022659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Metaheuristic algorithms have the drawback that local optimal solutions are prone to precocious convergence. In order to overcome the disadvantages of the whale optimization algorithm, we propose an improved selective opposition whale optimization algorithm (ISOWOA) in this paper. Firstly, the enhanced quasi-opposition learning (EQOBL) is applied to selectively update the position of the predator, calculate the fitness of the population before and after, and retain optimal individuals as the food source position; Secondly, an improved time-varying update strategy for inertia weight predator position is proposed, and the position update of the food source is completed by this strategy. The performance of the algorithm is analyzed by 23 benchmark functions of CEC 2005 and 15 benchmark functions of CEC 2015 in various dimensions. The superior results are further shown by Wilcoxon's rank sum test and Friedman's nonparametric rank test. Finally, its applicability is demonstrated through applications to the field of biological computing. In this paper, our aim is to achieve access to DNA files and designs high-quantity DNA code sets by ISOWOA. The experimental results show that the lower bounds of the multi-constraint storage coding sets implemented in this paper equals or surpasses that of previous optimal constructions. The data show that the amount of the DNA storage cods filtered by ISOWOA increased 2-18%, which demonstrates the algorithm's reliability in practical optimization tasks.
Collapse
Affiliation(s)
- Sijie Wang
- Key laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Shihua Zhou
- Key laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Weiqi Yan
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| |
Collapse
|
20
|
Li X, Zhou S, Zou L. Design of DNA Storage Coding with Enhanced Constraints. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1151. [PMID: 36010815 PMCID: PMC9407506 DOI: 10.3390/e24081151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/09/2022] [Accepted: 08/17/2022] [Indexed: 05/28/2023]
Abstract
Traditional storage media have been gradually unable to meet the needs of data storage around the world, and one solution to this problem is DNA storage. However, it is easy to make errors in the subsequent sequencing reading process of DNA storage coding. To reduces error rates, a method to enhance the robustness of the DNA storage coding set is proposed. Firstly, to reduce the likelihood of secondary structure in DNA coding sets, a repeat tandem sequence constraint is proposed. An improved DTW distance constraint is proposed to address the issue that the traditional distance constraint cannot accurately evaluate non-specific hybridization between DNA sequences. Secondly, an algorithm that combines random opposition-based learning and eddy jump strategy with Aquila Optimizer (AO) is proposed in this paper, which is called ROEAO. Finally, the ROEAO algorithm is used to construct the coding sets with traditional constraints and enhanced constraints, respectively. The quality of the two coding sets is evaluated by the test of the number of issuing card structures and the temperature stability of melting; the data show that the coding set constructed with ROEAO under enhanced constraints can obtain a larger lower bound while improving the coding quality.
Collapse
Affiliation(s)
- Xiangjun Li
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian 116622, China
| | | |
Collapse
|
21
|
Chen W, Wang S, Song T, Li X, Han P, Gao C. DCSE:Double-Channel-Siamese-Ensemble model for protein protein interaction prediction. BMC Genomics 2022; 23:555. [PMID: 35922751 PMCID: PMC9351149 DOI: 10.1186/s12864-022-08772-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/15/2022] [Indexed: 11/15/2022] Open
Abstract
Background Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction. Results We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model’s performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{1}$$\end{document}F1, Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{1}$$\end{document}F1, Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent. Conclusion Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08772-6.
Collapse
Affiliation(s)
- Wenqi Chen
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China.
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China.,Department of Artificial Intelligence, Polytechnical University of Madrid, Madrid, Spain
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Peifu Han
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Changnan Gao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
22
|
Wang P, Mu Z, Sun L, Si S, Wang B. Hidden Addressing Encoding for DNA Storage. Front Bioeng Biotechnol 2022; 10:916615. [PMID: 35928958 PMCID: PMC9344065 DOI: 10.3389/fbioe.2022.916615] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 06/21/2022] [Indexed: 11/13/2022] Open
Abstract
DNA is a natural storage medium with the advantages of high storage density and long service life compared with traditional media. DNA storage can meet the current storage requirements for massive data. Owing to the limitations of the DNA storage technology, the data need to be converted into short DNA sequences for storage. However, in the process, a large amount of physical redundancy will be generated to index short DNA sequences. To reduce redundancy, this study proposes a DNA storage encoding scheme with hidden addressing. Using the improved fountain encoding scheme, the index replaces part of the data to realize hidden addresses, and then, a 10.1 MB file is encoded with the hidden addressing. First, the Dottup dot plot generator and the Jaccard similarity coefficient analyze the overall self-similarity of the encoding sequence index, and then the sequence fragments of GC content are used to verify the performance of this scheme. The final results show that the encoding scheme indexes with overall lower self-similarity, and the local thermodynamic properties of the sequence are better. The hidden addressing encoding scheme proposed can not only improve the utilization of bases but also ensure the correct rate of DNA storage during the sequencing and decoding processes.
Collapse
|
23
|
Adaptive coding for DNA storage with high storage density and low coverage. NPJ Syst Biol Appl 2022; 8:23. [PMID: 35788589 PMCID: PMC9253015 DOI: 10.1038/s41540-022-00233-w] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/10/2022] [Indexed: 11/09/2022] Open
Abstract
The rapid development of information technology has generated substantial data, which urgently requires new storage media and storage methods. DNA, as a storage medium with high density, high durability, and ultra-long storage time characteristics, is promising as a potential solution. However, DNA storage is still in its infancy and suffers from low space utilization of DNA strands, high read coverage, and poor coding coupling. Therefore, in this work, an adaptive coding DNA storage system is proposed to use different coding schemes for different coding region locations, and the method of adaptively generating coding constraint thresholds is used to optimize at the system level to ensure the efficient operation of each link. Images, videos, and PDF files of size 698 KB were stored in DNA using adaptive coding algorithms. The data were sequenced and losslessly decoded into raw data. Compared with previous work, the DNA storage system implemented by adaptive coding proposed in this paper has high storage density and low read coverage, which promotes the development of carbon-based storage systems.
Collapse
|
24
|
Li X, Han P, Wang G, Chen W, Wang S, Song T. SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction. BMC Genomics 2022; 23:474. [PMID: 35761175 PMCID: PMC9235110 DOI: 10.1186/s12864-022-08687-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 06/10/2022] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) dominate intracellular molecules to perform a series of tasks such as transcriptional regulation, information transduction, and drug signalling. The traditional wet experiment method to obtain PPIs information is costly and time-consuming. RESULT In this paper, SDNN-PPI, a PPI prediction method based on self-attention and deep learning is proposed. The method adopts amino acid composition (AAC), conjoint triad (CT), and auto covariance (AC) to extract global and local features of protein sequences, and leverages self-attention to enhance DNN feature extraction to more effectively accomplish the prediction of PPIs. In order to verify the generalization ability of SDNN-PPI, a 5-fold cross-validation on the intraspecific interactions dataset of Saccharomyces cerevisiae (core subset) and human is used to measure our model in which the accuracy reaches 95.48% and 98.94% respectively. The accuracy of 93.15% and 88.33% are obtained in the interspecific interactions dataset of human-Bacillus Anthracis and Human-Yersinia pestis, respectively. In the independent data set Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, all prediction accuracy is 100%, which is higher than the previous PPIs prediction methods. To further evaluate the advantages and disadvantages of the model, the one-core and crossover network are conducted to predict PPIs, and the data show that the model correctly predicts the interaction pairs in the network. CONCLUSION In this paper, AAC, CT and AC methods are used to encode the sequence, and SDNN-PPI method is proposed to predict PPIs based on self-attention deep learning neural network. Satisfactory results are obtained on interspecific and intraspecific data sets, and good performance is also achieved in cross-species prediction. It can also correctly predict the protein interaction of cell and tumor information contained in one-core network and crossover network.The SDNN-PPI proposed in this paper not only explores the mechanism of protein-protein interaction, but also provides new ideas for drug design and disease prevention.
Collapse
Affiliation(s)
- Xue Li
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China
| | - Peifu Han
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China
| | - Gan Wang
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China
| | - Wenqi Chen
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China
| | - Tao Song
- College of Computer Science and technology, China University of Petroleum (East China), Qingdao, China.
| |
Collapse
|
25
|
Ezekannagha C, Becker A, Heider D, Hattab G. Design considerations for advancing data storage with synthetic DNA for long-term archiving. Mater Today Bio 2022; 15:100306. [PMID: 35677811 PMCID: PMC9167972 DOI: 10.1016/j.mtbio.2022.100306] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 05/05/2022] [Accepted: 05/22/2022] [Indexed: 11/22/2022]
Abstract
Deoxyribonucleic acid (DNA) is increasingly emerging as a serious medium for long-term archival data storage because of its remarkable high-capacity, high-storage-density characteristics and its lasting ability to store data for thousands of years. Various encoding algorithms are generally required to store digital information in DNA and to maintain data integrity. Indeed, since DNA is the information carrier, its performance under different processing and storage conditions significantly impacts the capabilities of the data storage system. Therefore, the design of a DNA storage system must meet specific design considerations to be less error-prone, robust and reliable. In this work, we summarize the general processes and technologies employed when using synthetic DNA as a storage medium. We also share the design considerations for sustainable engineering to include viability. We expect this work to provide insight into how sustainable design can be used to develop an efficient and robust synthetic DNA-based storage system for long-term archiving.
Collapse
Affiliation(s)
- Chisom Ezekannagha
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
- Corresponding author.
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, D-35043, Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| |
Collapse
|
26
|
Shang Z, Zhou C, Zhang Q. Chemical Reaction Networks’ Programming for Solving Equations. Curr Issues Mol Biol 2022; 44:1725-1739. [PMID: 35723377 PMCID: PMC9164072 DOI: 10.3390/cimb44040119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 11/16/2022] Open
Abstract
The computational ability of the chemical reaction networks (CRNs) using DNA as the substrate has been verified previously. To solve more complex computational problems and perform the computational steps as expected, the practical design of the basic modules of calculation and the steps in the reactions have become the basic requirements for biomolecular computing. This paper presents a method for solving nonlinear equations in the CRNs with DNA as the substrate. We used the basic calculation module of the CRNs with a gateless structure to design discrete and analog algorithms and realized the nonlinear equations that could not be solved in the previous work, such as exponential, logarithmic, and simple triangle equations. The solution of the equation uses the transformation method, Taylor expansion, and Newton iteration method, and the simulation verified this through examples. We used and improved the basic calculation module of the CRN++ programming language, optimized the error in the basic module, and analyzed the error’s variation over time.
Collapse
Affiliation(s)
- Ziwei Shang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China;
| | - Changjun Zhou
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China;
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China;
- Correspondence:
| |
Collapse
|
27
|
Liu X, Zhang Q, Zhang X, Liu Y, Yao Y, Kasabov N. Construction of Multiple Logic Circuits Based on Allosteric DNAzymes. Biomolecules 2022; 12:biom12040495. [PMID: 35454084 PMCID: PMC9032175 DOI: 10.3390/biom12040495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/20/2022] [Accepted: 03/21/2022] [Indexed: 11/22/2022] Open
Abstract
In DNA computing, the implementation of complex and stable logic operations in a universal system is a critical challenge. It is necessary to develop a system with complex logic functions based on a simple mechanism. Here, the strategy to control the secondary structure of assembled DNAzymes’ conserved domain is adopted to regulate the activity of DNAzymes and avoid the generation of four-way junctions, and makes it possible to implement basic logic gates and their cascade circuits in the same system. In addition, the purpose of threshold control achieved by the allosteric secondary structure implements a three-input DNA voter with one-vote veto function. The scalability of the system can be remarkably improved by adjusting the threshold to implement a DNA voter with 2n + 1 inputs. The proposed strategy provides a feasible idea for constructing more complex DNA circuits and a highly integrated computing system.
Collapse
Affiliation(s)
- Xin Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
- Correspondence: ; Tel.: +86-0411-84708470
| | - Xun Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Yao Yao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
- Intelligent Systems Research Center, Ulster University, Londonderry BT52 1SA, UK
| |
Collapse
|
28
|
Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage. MATHEMATICS 2022. [DOI: 10.3390/math10050845] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
DNA has evolved as a cutting-edge medium for digital information storage due to its extremely high density and durable preservation to accommodate the data explosion. However, the strings of DNA are prone to errors during the hybridization process. In addition, DNA synthesis and sequences come with a cost that depends on the number of nucleotides present. An efficient model to store a large amount of data in a small number of nucleotides is essential, and it must control the hybridization errors among the base pairs. In this paper, a novel computational model is presented to design large DNA libraries of oligonucleotides. It is established by integrating a neural network (NN) with combinatorial biological constraints, including constant GC-content and satisfying Hamming distance and reverse-complement constraints. We develop a simple and efficient implementation of NNs to produce the optimal DNA codes, which opens the door to applying neural networks for DNA-based data storage. Further, the combinatorial bio-constraints are introduced to improve the lower bounds and to avoid the occurrence of errors in the DNA codes. Our goal is to compute large DNA codes in shorter sequences, which should avoid non-specific hybridization errors by satisfying the bio-constrained coding. The proposed model yields a significant improvement in the DNA library by explicitly constructing larger codes than the prior published codes.
Collapse
|
29
|
Fu H, Lv H, Zhang Q. Using entropy-driven amplifier circuit response to build nonlinear model under the influence of Lévy jump. BMC Bioinformatics 2022; 22:437. [PMID: 35057730 PMCID: PMC8772049 DOI: 10.1186/s12859-021-04331-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 08/23/2021] [Indexed: 02/06/2023] Open
Abstract
Background Bioinformatics is a subject produced by the combination of life science and computer science. It mainly uses computer technology to study the laws of biological systems. The design and realization of DNA circuit reaction is one of the important contents of bioinformatics. Results In this paper, nonlinear dynamic system model with Lévy jump based on entropy-driven amplifier (EDA) circuit response is studied. Firstly, nonlinear biochemical reaction system model is established based on EDA circuit response. Considering the influence of disturbance factors on the system, nonlinear biochemical reaction system with Lévy jump is built. Secondly, in order to prove that the constructed system conforms to the actual meaning, the existence and uniqueness of the system solution is analyzed. Next, the sufficient conditions for the end and continuation of EDA circuit reaction are certified. Finally, the correctness of the theoretical results is proved by numerical simulation, and the reactivity of THTSignal in EDA circuit under different noise intensity is verified. Conclusions In EDA circuit reaction, the intensity of external noise has a significant impact on the system. The end of EDA circuit reaction is closely related to the intensity of Lévy noise, and Lévy jump has a significant impact on the nature of biochemical reaction system.
Collapse
|
30
|
Cui X, Liu Y, Zhang Q. DNA tile self-assembly driven by antibody-mediated four-way branch migration. Analyst 2022; 147:2223-2230. [DOI: 10.1039/d1an02273c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The antibody-mediated four-way branch migration mechanism provides a novel idea for realizing the assembly of nanostructures, simply by attaching structures such as tiles, proteins, quantum dots, etc. to the ends of the four-way branches.
Collapse
Affiliation(s)
- Xingdi Cui
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
31
|
Gao S, Wu R, Zhang Q. A novel strategy for programmable DNA tile self-assembly with a DNAzyme-mediated DNA cross circuit. NEW J CHEM 2022. [DOI: 10.1039/d1nj06012k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The proposed strategy promotes the controllability and modularization of trigger elements, realizes programmable molecular self-assembly, and has broad applications for the construction of DNA nanodevices.
Collapse
Affiliation(s)
- Siqi Gao
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
| | - Ranfeng Wu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian 116622, China
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
32
|
Xing C, Zheng X, Zhang Q. Constructing DNA logic circuits based on the toehold preemption mechanism. RSC Adv 2021; 12:338-345. [PMID: 35424506 PMCID: PMC8978688 DOI: 10.1039/d1ra08687a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 12/14/2021] [Indexed: 11/21/2022] Open
Abstract
Strand displacement technology and ribozyme digestion technology have enriched the intelligent toolbox of molecular computing and provided more methods for the construction of DNA logic circuits. In recent years, DNA logic circuits have developed rapidly, and their scalability and accuracy in molecular computing and information processing have been fully demonstrated. However, existing DNA logic circuits still have some problems such as high complexity of DNA strands (number of DNA strands) hindering the expansion of practical computing tasks. In view of the above problems, we presented a toehold preemption mechanism and applied it to construct DNA logic circuits using E6-type DNAzymes, such as half adder circuit, half subtractor circuit, and 4-bit square root logic circuit. Different from the dual-track logic expressions, all the signals in the circuits of this study were monorail which substantially reduced the number of DNA strands in the DNA logic circuits. The presented preemption mechanism provides a way to simplify the implementation of large and complex DNA integrated circuits.
Collapse
Affiliation(s)
- Cuicui Xing
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education Dalian 116622 China
| | - Xuedong Zheng
- College of Computer Science, Shenyang Aerospace University Shenyang 110136 China
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education Dalian 116622 China
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| |
Collapse
|
33
|
Xu S, Liu Y, Zhou S, Zhang Q, Kasabov NK. DNA Matrix Operation Based on the Mechanism of the DNAzyme Binding to Auxiliary Strands to Cleave the Substrate. Biomolecules 2021; 11:1797. [PMID: 34944442 PMCID: PMC8698824 DOI: 10.3390/biom11121797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/21/2021] [Accepted: 11/27/2021] [Indexed: 11/16/2022] Open
Abstract
Numerical computation is a focus of DNA computing, and matrix operations are among the most basic and frequently used operations in numerical computation. As an important computing tool, matrix operations are often used to deal with intensive computing tasks. During calculation, the speed and accuracy of matrix operations directly affect the performance of the entire computing system. Therefore, it is important to find a way to perform matrix calculations that can ensure the speed of calculations and improve the accuracy. This paper proposes a DNA matrix operation method based on the mechanism of the DNAzyme binding to auxiliary strands to cleave the substrate. In this mechanism, the DNAzyme binding substrate requires the connection of two auxiliary strands. Without any of the two auxiliary strands, the DNAzyme does not cleave the substrate. Based on this mechanism, the multiplication operation of two matrices is realized; the two types of auxiliary strands are used as elements of the two matrices, to participate in the operation, and then are combined with the DNAzyme to cut the substrate and output the result of the matrix operation. This research provides a new method of matrix operations and provides ideas for more complex computing systems.
Collapse
Affiliation(s)
- Shaoxia Xu
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
| | - Nikola K. Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
- Intelligent Systems Research Center, Ulster University, Londonderry BT52 1SA, UK
| |
Collapse
|
34
|
Wu J, Zheng Y, Wang B, Zhang Q. Enhancing Physical and Thermodynamic Properties of DNA Storage Sets with End-constraint. IEEE Trans Nanobioscience 2021; 21:184-193. [PMID: 34662278 DOI: 10.1109/tnb.2021.3121278] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the explosion of data, DNA is considered as an ideal carrier for storage due to its high storage density. However, low-quality DNA sets hamper the widespread use of DNA storage. This work proposes a new method to design high-quality DNA storage sets. Firstly, random switch and double-weight offspring strategies are introduced in Double-strategy Black Widow Optimization Algorithm (DBWO). Experimental results of 26 benchmark functions show that the exploration and exploitation abilities of DBWO are greatly improved from previous work. Secondly, DBWO is applied in designing DNA storage sets, and compared with previous work, the lower bounds of storage sets are boosted by 9%-37%. Finally, to improve the poor stabilities of sequences, the End-constraint is proposed in designing DNA storage sets. By measuring the number of hairpin structures, melting temperature, and minimum free energy, it is evaluated that with our innovative constraint, DBWO can construct not only a larger number of storage sets, but also enhance physical and thermodynamic properties of DNA storage sets.
Collapse
|
35
|
Shi Y, Hu Y, Wang B. Image Encryption Scheme Based on Multiscale Block Compressed Sensing and Markov Model. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1297. [PMID: 34682021 PMCID: PMC8534541 DOI: 10.3390/e23101297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/23/2021] [Accepted: 09/26/2021] [Indexed: 11/25/2022]
Abstract
Many image encryption schemes based on compressed sensing have the problem of poor quality of decrypted images. To deal with this problem, this paper develops an image encryption scheme by multiscale block compressed sensing. The image is decomposed by a three-level wavelet transform, and the sampling rates of coefficient matrices at all levels are calculated according to multiscale block compressed sensing theory and the given compression ratio. The first round of permutation is performed on the internal elements of the coefficient matrices at all levels. Then the coefficient matrix is compressed and combined. The second round of permutation is performed on the combined matrix based on the state transition matrix. Independent diffusion and forward-backward diffusion between pixels are used to obtain the final cipher image. Different sampling rates are set by considering the difference of information between an image's low- and high-frequency parts. Therefore, the reconstruction quality of the decrypted image is better than that of other schemes, which set one sampling rate on an entire image. The proposed scheme takes full advantage of the randomness of the Markov model and shows an excellent encryption effect to resist various attacks.
Collapse
Affiliation(s)
| | | | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China; (Y.S.); (Y.H.)
| |
Collapse
|
36
|
Xiaoru L, Ling G. Combinatorial constraint coding based on the EORS algorithm in DNA storage. PLoS One 2021; 16:e0255376. [PMID: 34324571 PMCID: PMC8320985 DOI: 10.1371/journal.pone.0255376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 07/15/2021] [Indexed: 11/19/2022] Open
Abstract
The development of information technology has produced massive amounts of data, which has brought severe challenges to information storage. Traditional electronic storage media cannot keep up with the ever-increasing demand for data storage, but in its place DNA has emerged as a feasible storage medium with high density, large storage capacity and strong durability. In DNA data storage, many different approaches can be used to encode data into codewords. DNA coding is a key step in DNA storage and can directly affect storage performance and data integrity. However, since errors are prone to occur in DNA synthesis and sequencing, and non-specific hybridization is prone to occur in the solution, how to effectively encode DNA has become an urgent problem to be solved. In this article, we propose a DNA storage coding method based on the equilibrium optimization random search (EORS) algorithm, which meets the Hamming distance, GC content and no-runlength constraints and can reduce the error rate in storage. Simulation experiments have shown that the size of the DNA storage code set constructed by the EORS algorithm that meets the combination constraints has increased by an average of 11% compared with previous work. The increase in the code set means that shorter DNA chains can be used to store more data.
Collapse
Affiliation(s)
- Li Xiaoru
- Hulunbeier Vocational and Technical College, Hulunbeier, Inner Mongolia, China
| | - Guo Ling
- Baidu Co., Ltd., Shanghai, China
| |
Collapse
|
37
|
Constrained transformer network for ECG signal processing and arrhythmia classification. BMC Med Inform Decis Mak 2021; 21:184. [PMID: 34107920 PMCID: PMC8191107 DOI: 10.1186/s12911-021-01546-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 05/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Heart disease diagnosis is a challenging task and it is important to explore useful information from the massive amount of electrocardiogram (ECG) records of patients. The high-precision diagnostic identification of ECG can save clinicians and cardiologists considerable time while helping reduce the possibility of misdiagnosis at the same time.Currently, some deep learning-based methods can effectively perform feature selection and classification prediction, reducing the consumption of manpower. METHODS In this work, an end-to-end deep learning framework based on convolutional neural network (CNN) is proposed for ECG signal processing and arrhythmia classification. In the framework, a transformer network is embedded in CNN to capture the temporal information of ECG signals and a new link constraint is introduced to the loss function to enhance the classification ability of the embedding vector. RESULTS To evaluate the proposed method, extensive experiments based on real-world data were conducted. Experimental results show that the proposed model achieve better performance than most baselines. The experiment results also proved that the transformer network pays more attention to the temporal continuity of the data and captures the hidden deep features of the data well. The link constraint strengthens the constraint on the embedded features and effectively suppresses the effect of data imbalance on the results. CONCLUSIONS In this paper, an end-to-end model is used to process ECG signal and classify arrhythmia. The model combine CNN and Transformer network to extract temporal information in ECG signal and is capable of performing arrhythmia classification with acceptable accuracy. The model can help cardiologists perform assisted diagnosis of heart disease and improve the efficiency of healthcare delivery.
Collapse
|
38
|
Li X, Wei Z, Wang B, Song T. Stable DNA Sequence Over Close-Ending and Pairing Sequences Constraint. Front Genet 2021; 12:644484. [PMID: 34079580 PMCID: PMC8165483 DOI: 10.3389/fgene.2021.644484] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 04/12/2021] [Indexed: 11/15/2022] Open
Abstract
DNA computing is a new method based on molecular biotechnology to solve complex problems. The design of DNA sequences is a multi-objective optimization problem in DNA computing, whose objective is to obtain optimized sequences that satisfy multiple constraints to improve the quality of the sequences. However, the previous optimized DNA sequences reacted with each other, which reduced the number of DNA sequences that could be used for molecular hybridization in the solution and thus reduced the accuracy of DNA computing. In addition, a DNA sequence and its complement follow the principle of complementary pairing, and the sequence of base GC at both ends is more stable. To optimize the above problems, the constraints of Pairing Sequences Constraint (PSC) and Close-ending along with the Improved Chaos Whale (ICW) optimization algorithm were proposed to construct a DNA sequence set that satisfies the combination of constraints. The ICW optimization algorithm is added to a new predator–prey strategy and sine and cosine functions under the action of chaos. Compared with other algorithms, among the 23 benchmark functions, the new algorithm obtained the minimum value for one-third of the functions and two-thirds of the current minimum value. The DNA sequences satisfying the constraint combination obtained the minimum of fitness values and had stable and usable structures.
Collapse
Affiliation(s)
- Xue Li
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Ziqi Wei
- School of Software, Tsinghua University, Beijing, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Tao Song
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, China
| |
Collapse
|
39
|
Zheng Y, Wu J, Wang B. CLGBO: An Algorithm for Constructing Highly Robust Coding Sets for DNA Storage. Front Genet 2021; 12:644945. [PMID: 34017354 PMCID: PMC8129200 DOI: 10.3389/fgene.2021.644945] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/08/2021] [Indexed: 11/22/2022] Open
Abstract
In the era of big data, new storage media are urgently needed because the storage capacity for global data cannot meet the exponential growth of information. Deoxyribonucleic acid (DNA) storage, where primer and address sequences play a crucial role, is one of the most promising storage media because of its high density, large capacity and durability. In this study, we describe an enhanced gradient-based optimizer that includes the Cauchy and Levy mutation strategy (CLGBO) to construct DNA coding sets, which are used as primer and address libraries. Our experimental results show that the lower bounds of DNA storage coding sets obtained using the CLGBO algorithm are increased by 4.3–13.5% compared with previous work. The non-adjacent subsequence constraint was introduced to reduce the error rate in the storage process. This helps to resolve the problem that arises when consecutive repetitive subsequences in the sequence cause errors in DNA storage. We made use of the CLGBO algorithm and the non-adjacent subsequence constraint to construct larger and more highly robust coding sets.
Collapse
Affiliation(s)
- Yanfen Zheng
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Jieqiong Wu
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| |
Collapse
|
40
|
Cao B, Zhang X, Wu J, Wang B, Zhang Q, Wei X. Minimum Free Energy Coding for DNA Storage. IEEE Trans Nanobioscience 2021; 20:212-222. [PMID: 33534710 DOI: 10.1109/tnb.2021.3056351] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
With the development of information technology, huge amounts of data are produced at the same time. How to store data efficiently and at low cost has become an urgent problem. DNA is a high-density and persistent medium, making DNA storage a viable solution. In a DNA data storage system, the first consideration is how to encode the data effectively into code words. However, DNA strands are prone to non-specific hybridization during the hybridization reaction process and are prone to errors during synthesis and sequencing. In order to reduce the error rate, a thermodynamic minimum free energy (MFE) constraint is proposed and applied to the construction of coding sets for DNA storage. The Brownian multi-verse optimizer (BMVO) algorithm, based on the Multi-verse optimizer (MVO) algorithm, incorporates the idea of Brownian motion and Nelder-Mead method, and it is used to design a better DNA storage coding set. In addition, compared with previous works, the coding set has been increasing by 4%-50% in size and has better thermodynamic properties. With the improvement of the quality of the DNA coding set, the accuracy of reading and writing and the robustness of the DNA storage system are also enhanced.
Collapse
|
41
|
Chen C, Wu R, Wang B. Development of a neuron model based on DNAzyme regulation. RSC Adv 2021; 11:9985-9994. [PMID: 35423534 PMCID: PMC8695483 DOI: 10.1039/d0ra10515e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 03/02/2021] [Indexed: 12/25/2022] Open
Abstract
Neural networks based on DNA molecular circuits play an important role in molecular information processing and artificial intelligence systems. In fact, some DNA molecular systems can become dynamic units with the assistance of DNAzymes. The complex DNA circuits can spontaneously induce corresponding feedback behaviors when their inputs changed. However, most of the reported DNA neural networks have been implemented by the toehold-mediated strand displacement (TMSD) method. Therefore, it was important to develop a method to build a neural network utilizing the TMSD mechanism and adding a mechanism to account for modulation by DNAzymes. In this study, we designed a model of a DNA neuron controlled by DNAzymes. We proposed an approach based on the DNAzyme modulation of neuronal function, combing two reaction mechanisms: DNAzyme digestion and TMSD. Using the DNAzyme adjustment, each component simulating the characteristics of neurons was constructed. By altering the input and weight of the neuron model, we verified the correctness of the computational function of the neurons. Furthermore, in order to verify the application potential of the neurons in specific functions, a voting machine was successfully implemented. The proposed neuron model regulated by DNAzymes was simple to construct and possesses strong scalability, having great potential for use in the construction of large neural networks.
Collapse
Affiliation(s)
- Cong Chen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University Dalian 116622 China
| | - Ranfeng Wu
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University Dalian 116622 China
| |
Collapse
|