1
|
Asim MN, Ibrahim MA, Zaib A, Dengel A. DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Front Med (Lausanne) 2025; 12:1503229. [PMID: 40265190 PMCID: PMC12011883 DOI: 10.3389/fmed.2025.1503229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 03/10/2025] [Indexed: 04/24/2025] Open
Abstract
Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline and somatic mutations. Germline mutations underlie hereditary conditions, while somatic mutations can be induced by various factors including environmental influences, chemicals, lifestyle choices, and errors in DNA replication and repair mechanisms which can lead to cancer. DNA sequence analysis plays a pivotal role in uncovering the intricate information embedded within an organism's genetic blueprint and understanding the factors that can modify it. This analysis helps in early detection of genetic diseases and the design of targeted therapies. Traditional wet-lab experimental DNA sequence analysis through traditional wet-lab experimental methods is costly, time-consuming, and prone to errors. To accelerate large-scale DNA sequence analysis, researchers are developing AI applications that complement wet-lab experimental methods. These AI approaches can help generate hypotheses, prioritize experiments, and interpret results by identifying patterns in large genomic datasets. Effective integration of AI methods with experimental validation requires scientists to understand both fields. Considering the need of a comprehensive literature that bridges the gap between both fields, contributions of this paper are manifold: It presents diverse range of DNA sequence analysis tasks and AI methodologies. It equips AI researchers with essential biological knowledge of 44 distinct DNA sequence analysis tasks and aligns these tasks with 3 distinct AI-paradigms, namely, classification, regression, and clustering. It streamlines the integration of AI into DNA sequence analysis tasks by consolidating information of 36 diverse biological databases that can be used to develop benchmark datasets for 44 different DNA sequence analysis tasks. To ensure performance comparisons between new and existing AI predictors, it provides insights into 140 benchmark datasets related to 44 distinct DNA sequence analysis tasks. It presents word embeddings and language models applications across 44 distinct DNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 39 word embeddings and 67 language models based predictive pipeline performance values as well as top performing traditional sequence encoding-based predictors and their performances across 44 DNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Arooj Zaib
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
2
|
Wei T, Lu C, Du H, Yang Q, Qi X, Liu Y, Zhang Y, Chen C, Li Y, Tang Y, Zhang WH, Tao X, Jiang N. DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes. Brief Bioinform 2024; 25:bbae484. [PMID: 39344712 PMCID: PMC11440089 DOI: 10.1093/bib/bbae484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/18/2024] [Accepted: 09/13/2024] [Indexed: 10/01/2024] Open
Abstract
Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.
Collapse
Affiliation(s)
- Tongqing Wei
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Chenqi Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Hanxiao Du
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Qianru Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Xin Qi
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
| | - Yankun Liu
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
| | - Yi Zhang
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Chen Chen
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Yutong Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Yuanhao Tang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Wen-Hong Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Xu Tao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Ning Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| |
Collapse
|
3
|
Zhang X, Zhang D, Zhang X, Zhang X. Artificial intelligence applications in the diagnosis and treatment of bacterial infections. Front Microbiol 2024; 15:1449844. [PMID: 39165576 PMCID: PMC11334354 DOI: 10.3389/fmicb.2024.1449844] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 07/04/2024] [Indexed: 08/22/2024] Open
Abstract
The diagnosis and treatment of bacterial infections in the medical and public health field in the 21st century remain significantly challenging. Artificial Intelligence (AI) has emerged as a powerful new tool in diagnosing and treating bacterial infections. AI is rapidly revolutionizing epidemiological studies of infectious diseases, providing effective early warning, prevention, and control of outbreaks. Machine learning models provide a highly flexible way to simulate and predict the complex mechanisms of pathogen-host interactions, which is crucial for a comprehensive understanding of the nature of diseases. Machine learning-based pathogen identification technology and antimicrobial drug susceptibility testing break through the limitations of traditional methods, significantly shorten the time from sample collection to the determination of result, and greatly improve the speed and accuracy of laboratory testing. In addition, AI technology application in treating bacterial infections, particularly in the research and development of drugs and vaccines, and the application of innovative therapies such as bacteriophage, provides new strategies for improving therapy and curbing bacterial resistance. Although AI has a broad application prospect in diagnosing and treating bacterial infections, significant challenges remain in data quality and quantity, model interpretability, clinical integration, and patient privacy protection. To overcome these challenges and, realize widespread application in clinical practice, interdisciplinary cooperation, technology innovation, and policy support are essential components of the joint efforts required. In summary, with continuous advancements and in-depth application of AI technology, AI will enable doctors to more effectivelyaddress the challenge of bacterial infection, promoting the development of medical practice toward precision, efficiency, and personalization; optimizing the best nursing and treatment plans for patients; and providing strong support for public health safety.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Deng Zhang
- Department of Infectious Diseases, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Xifan Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Xin Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| |
Collapse
|
4
|
Nie W, Qiu T, Wei Y, Ding H, Guo Z, Qiu J. Advances in phage-host interaction prediction: in silico method enhances the development of phage therapies. Brief Bioinform 2024; 25:bbae117. [PMID: 38555471 PMCID: PMC10981677 DOI: 10.1093/bib/bbae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/15/2024] [Accepted: 03/02/2024] [Indexed: 04/02/2024] Open
Abstract
Phages can specifically recognize and kill bacteria, which lead to important application value of bacteriophage in bacterial identification and typing, livestock aquaculture and treatment of human bacterial infection. Considering the variety of human-infected bacteria and the continuous discovery of numerous pathogenic bacteria, screening suitable therapeutic phages that are capable of infecting pathogens from massive phage databases has been a principal step in phage therapy design. Experimental methods to identify phage-host interaction (PHI) are time-consuming and expensive; high-throughput computational method to predict PHI is therefore a potential substitute. Here, we systemically review bioinformatic methods for predicting PHI, introduce reference databases and in silico models applied in these methods and highlight the strengths and challenges of current tools. Finally, we discuss the application scope and future research direction of computational prediction methods, which contribute to the performance improvement of prediction models and the development of personalized phage therapy.
Collapse
Affiliation(s)
- Wanchun Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, 200032, China
| | - Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Hao Ding
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Zhixiang Guo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| |
Collapse
|