1
|
Asim MN, Ibrahim MA, Zaib A, Dengel A. DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Front Med (Lausanne) 2025; 12:1503229. [PMID: 40265190 PMCID: PMC12011883 DOI: 10.3389/fmed.2025.1503229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 03/10/2025] [Indexed: 04/24/2025] Open
Abstract
Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline and somatic mutations. Germline mutations underlie hereditary conditions, while somatic mutations can be induced by various factors including environmental influences, chemicals, lifestyle choices, and errors in DNA replication and repair mechanisms which can lead to cancer. DNA sequence analysis plays a pivotal role in uncovering the intricate information embedded within an organism's genetic blueprint and understanding the factors that can modify it. This analysis helps in early detection of genetic diseases and the design of targeted therapies. Traditional wet-lab experimental DNA sequence analysis through traditional wet-lab experimental methods is costly, time-consuming, and prone to errors. To accelerate large-scale DNA sequence analysis, researchers are developing AI applications that complement wet-lab experimental methods. These AI approaches can help generate hypotheses, prioritize experiments, and interpret results by identifying patterns in large genomic datasets. Effective integration of AI methods with experimental validation requires scientists to understand both fields. Considering the need of a comprehensive literature that bridges the gap between both fields, contributions of this paper are manifold: It presents diverse range of DNA sequence analysis tasks and AI methodologies. It equips AI researchers with essential biological knowledge of 44 distinct DNA sequence analysis tasks and aligns these tasks with 3 distinct AI-paradigms, namely, classification, regression, and clustering. It streamlines the integration of AI into DNA sequence analysis tasks by consolidating information of 36 diverse biological databases that can be used to develop benchmark datasets for 44 different DNA sequence analysis tasks. To ensure performance comparisons between new and existing AI predictors, it provides insights into 140 benchmark datasets related to 44 distinct DNA sequence analysis tasks. It presents word embeddings and language models applications across 44 distinct DNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 39 word embeddings and 67 language models based predictive pipeline performance values as well as top performing traditional sequence encoding-based predictors and their performances across 44 DNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Arooj Zaib
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
2
|
Yang M, Shi Y, Song Q, Wei Z, Dun X, Wang Z, Wang Z, Qiu CW, Zhang H, Cheng X. Optical sorting: past, present and future. LIGHT, SCIENCE & APPLICATIONS 2025; 14:103. [PMID: 40011460 DOI: 10.1038/s41377-024-01734-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 12/02/2024] [Accepted: 12/24/2024] [Indexed: 02/28/2025]
Abstract
Optical sorting combines optical tweezers with diverse techniques, including optical spectrum, artificial intelligence (AI) and immunoassay, to endow unprecedented capabilities in particle sorting. In comparison to other methods such as microfluidics, acoustics and electrophoresis, optical sorting offers appreciable advantages in nanoscale precision, high resolution, non-invasiveness, and is becoming increasingly indispensable in fields of biophysics, chemistry, and materials science. This review aims to offer a comprehensive overview of the history, development, and perspectives of various optical sorting techniques, categorised as passive and active sorting methods. To begin, we elucidate the fundamental physics and attributes of both conventional and exotic optical forces. We then explore sorting capabilities of active optical sorting, which fuses optical tweezers with a diversity of techniques, including Raman spectroscopy and machine learning. Afterwards, we reveal the essential roles played by deterministic light fields, configured with lens systems or metasurfaces, in the passive sorting of particles based on their varying sizes and shapes, sorting resolutions and speeds. We conclude with our vision of the most promising and futuristic directions, including AI-facilitated ultrafast and bio-morphology-selective sorting. It can be envisioned that optical sorting will inevitably become a revolutionary tool in scientific research and practical biomedical applications.
Collapse
Affiliation(s)
- Meng Yang
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China
| | - Yuzhi Shi
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China.
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China.
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China.
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China.
| | - Qinghua Song
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Zeyong Wei
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China
| | - Xiong Dun
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China
| | - Zhiming Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zhanshan Wang
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China
| | - Cheng-Wei Qiu
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 117583, Singapore.
| | - Hui Zhang
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China.
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China.
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China.
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China.
| | - Xinbin Cheng
- Institute of Precision Optical Engineering, School of Physics Science and Engineering, Tongji University, Shanghai, 200092, China.
- MOE Key Laboratory of Advanced Micro-Structured Materials, Shanghai, 200092, China.
- Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China.
- Shanghai Frontiers Science Center of Digital Optics, Shanghai, 200092, China.
| |
Collapse
|
3
|
Xi J, Cheng X, Liu J. Causal relationship between gout and liver cancer: A Mendelian randomization and transcriptome analysis. Medicine (Baltimore) 2024; 103:e40299. [PMID: 39533594 PMCID: PMC11557110 DOI: 10.1097/md.0000000000040299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 10/11/2024] [Indexed: 11/16/2024] Open
Abstract
Gout is an inflammatory arthritis resulting from urate crystal deposition, now recognized as part of metabolic syndrome. Hyperuricemia, a hallmark of gout, is associated with various health complications, including liver cancer. Observational studies indicate a link between gout and increased cancer incidence. However, the causal relationship between gout and hepatocellular carcinoma remains uncertain. This study utilizes Mendelian randomization (MR) to explore this connection, minimizing confounding factors commonly present in observational studies. Genome-wide association study data for gout and liver cancer were sourced from the UK Biobank. We selected single nucleotide polymorphisms that are strongly associated with gout and liver cancer as instrumental variables for the analysis. We conducted 2-sample MR analysis using multiple MR methods (MR-Egger, weighted median, inverse variance weighting, and weighted mode) to evaluate causality. Co-localization and transcriptomic analyses were employed to identify target genes and assess their expression in hepatocellular carcinoma tissues. The 2-sample MR analysis indicated a significant causal relationship between gout and heightened liver cancer risk (P_IVW = .014). Co-localization analysis identified phosphatidylethanolamine N-methyltransferase (PEMT) as a crucial gene associated with gout (pH4 = 0.990). Transcriptomic data showed that PEMT expression was significantly higher in normal liver tissues compared to malignant samples (P < .001), and higher PEMT levels correlated with improved survival outcomes (P = .045). Immunohistochemical analysis revealed lower PEMT expression in hepatocellular carcinoma from patients with concurrent gout compared to those without (P < .05). The results indicate that gout increases the risk of hepatocellular carcinoma, with PEMT potentially playing a key role. Although this study focused on European populations, indicating a need for further research in diverse groups, the results emphasize the potential for liver cancer screening in newly diagnosed gout patients. Understanding the relationship between these conditions may inform future clinical practices and cancer prevention strategies.
Collapse
Affiliation(s)
- Jiaqi Xi
- Department of Endocrinology, People’s Liberation Army General Hospital of Southern Theatre Command, Guangzhou, China
| | - Xiaofang Cheng
- Department of Endocrinology, People’s Liberation Army General Hospital of Southern Theatre Command, Guangzhou, China
- Department of Liver Injury and Repair, Guangxi Medical University, Nanning, China
| | - Jun Liu
- Department of Endocrinology, People’s Liberation Army General Hospital of Southern Theatre Command, Guangzhou, China
- Department of Liver Injury and Repair, Guangxi Medical University, Nanning, China
| |
Collapse
|
4
|
Wang L, Wang Y, Li Y, Zhou L, Liu S, Cao Y, Li Y, Liu S, Du J, Wang J, Zhu T. A prospective diagnostic model for breast cancer utilizing machine learning to examine the molecular immune infiltrate in HSPB6. J Cancer Res Clin Oncol 2024; 150:475. [PMID: 39441229 PMCID: PMC11499434 DOI: 10.1007/s00432-024-05995-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 10/11/2024] [Indexed: 10/25/2024]
Abstract
BACKGROUND Breast cancer is a significant public health issue worldwide, being the most prevalent cancer among women and a leading cause of death related to this disease. The molecular processes that propel breast cancer progression are not fully elucidated, highlighting the intricate nature of the underlying biology and its crucial impact on global health. The objective of this research was to perform bioinformatics analyses on breast cancer-related datasets to gain a comprehensive understanding of the molecular mechanisms at play and to identify key genes associated with the disease. METHODS The toolkit analyses involve techniques such as differential gene expression analysis, Gene Set Enrichment Analysis (GSEA), Weighted Co-Expression Network Analysis (WGCNA), and Machine Learning algorithms. Furthermore, in vitro cell experiments have demonstrated the impact of HSPB6 on cell migration, proliferation, and apoptosis. RESULTS The study identified multiple genes that displayed differential expression in breast cancer, notably FHL1 and HSPB6. A machine learning model was developed in this study and specifically trained for breast cancer diagnosis using these genes, achieving high precision. Furthermore, analysis of immune cell infiltration revealed an enrichment of Tregs and M2 macrophages in the treated group, showcasing its significant impact on the tumor's immunological context. A temporal analysis of breast cancer cells using single-cell RNA sequencing provided insights into cellular developmental trajectories and highlighted changes in expression patterns across key genes during disease progression. The upregulation of HSPB6 in MCF7 cells significantly inhibited both cell migration and proliferation abilities, suggesting that promoting HSPB6 expression could induce ferroptosis in breast cancer cells. CONCLUSION Our findings have identified compelling molecular targets and distinctive diagnostic markers for the clinical management of breast cancer. This data will serve as crucial guidance for further research in the field.
Collapse
Affiliation(s)
- Lizhe Wang
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Yu Wang
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Yueyang Li
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Li Zhou
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Sihan Liu
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Yongyi Cao
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Yuzhi Li
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Shenting Liu
- Department of Oncology, Wuhu Second People's Hospital, No. 66 Municipal Access Road, Wuhu City, 241000, Anhui Province, China
| | - Jiahui Du
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Jin Wang
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China
| | - Ting Zhu
- Department of Oncology, The Third Affiliated Hospital of Anhui Medical University, Hefei First People's Hospital, No.390 Huaihe Road, Luyang District, Hefei City, 230071, Anhui Province, China.
| |
Collapse
|
5
|
Jiang M, Gu X, Xu Y, Wang J. Metabolism-associated molecular classification and prognosis signature of head and neck squamous cell carcinoma. Heliyon 2024; 10:e27587. [PMID: 38501009 PMCID: PMC10945276 DOI: 10.1016/j.heliyon.2024.e27587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 02/25/2024] [Accepted: 03/04/2024] [Indexed: 03/20/2024] Open
Abstract
Although the fundamental processes and chemical changes in metabolic programs have been elucidated in many cancers, the expression patterns of metabolism-related genes in head and neck squamous cell carcinoma (HNSCC) remain unclear. The mRNA expression profiles from the Cancer Genome Atlas included 502 tumour and 44 normal samples were extracted. We explored the biological functions and prognosis roles of metabolism-associated genes in patients with HNSCC. The results indicated that patients with HNSCC could be divided into three molecular subtypes (C1, C2 and C3) based on 249 metabolism-related genes. There were markedly different clinical characteristics, prognosis outcomes, and biological functions among the three subtypes. Different molecular subtypes also have different tumour microenvironments and immune infiltration levels. The established prognosis model with 17 signature genes could predict the prognosis of patients with HNSCC and was validated using an independent cohort dataset. An individual risk scoring tool was developed using the risk score and clinical parameters; the risk score was an independent prognostic factor for patients with HNSCC. Different risk stratifications have different clinical characteristics, biological features, tumour microenvironments and immune infiltration levels. Our study could be used for clinical risk management and to help conduct precision medicine for patients with HNSCC.
Collapse
Affiliation(s)
- Mengxian Jiang
- Department of Otorhinolaryngology Head and Neck Surgery, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430000, China
| | - Xiang Gu
- Department of Otorhinolaryngology Head and Neck Surgery, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430000, China
| | - Yexing Xu
- Department of Otorhinolaryngology Head and Neck Surgery, Maternal and Child Health of Hubei Province, Tongji Medical College, Huazhong University of Science and Technology, Hubei Province, 430000, China
| | - Jing Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Hubei Province, 430000, China
| |
Collapse
|
6
|
Hamraz M, Ali A, Mashwani WK, Aldahmani S, Khan Z. Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio. PLoS One 2023; 18:e0284619. [PMID: 37098036 PMCID: PMC10128961 DOI: 10.1371/journal.pone.0284619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/04/2023] [Indexed: 04/26/2023] Open
Abstract
Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (WSNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (WSNR) are also compared with four well known feature selection methods. We found that the (WSNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (WSNR) outperforms all the other methods included in the study.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Amjad Ali
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Wali Khan Mashwani
- Institute of Numerical Sciences, Kohat University of Science and Technology, Kohat, Pakistan
| | - Saeed Aldahmani
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| | - Zardad Khan
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| |
Collapse
|
7
|
Mammographic Classification of Breast Cancer Microcalcifications through Extreme Gradient Boosting. ELECTRONICS 2022. [DOI: 10.3390/electronics11152435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this paper, we proposed an effective and efficient approach to the classification of breast cancer microcalcifications and evaluated the mathematical model for calcification on mammography with a large medical dataset. We employed several semi-automatic segmentation algorithms to extract 51 calcification features from mammograms, including morphologic and textural features. We adopted extreme gradient boosting (XGBoost) to classify microcalcifications. Then, we compared other machine learning techniques, including k-nearest neighbor (kNN), adaboostM1, decision tree, random decision forest (RDF), and gradient boosting decision tree (GBDT), with XGBoost. XGBoost showed the highest accuracy (90.24%) for classifying microcalcifications, and kNN demonstrated the lowest accuracy. This result demonstrates that it is essential for the classification of microcalcification to use the feature engineering method for the selection of the best composition of features. One of the contributions of this study is to present the best composition of features for efficient classification of breast cancers. This paper finds a way to select the best discriminative features as a collection to improve the accuracy. This study showed the highest accuracy (90.24%) for classifying microcalcifications with AUC = 0.89. Moreover, we highlighted the performance of various features from the dataset and found ideal parameters for classifying microcalcifications. Furthermore, we found that the XGBoost model is suitable both in theory and practice for the classification of calcifications on mammography.
Collapse
|