1
|
Rezaei S, Hamedani Z, Ahmadi K, Ghannadikhosh P, Motamedi A, Athari M, Yousefi H, Rajabi AH, Abbasi A, Arabi H. Role of machine learning in molecular pathology for breast cancer: A review on gene expression profiling and RNA sequencing application. Crit Rev Oncol Hematol 2025; 213:104780. [PMID: 40419230 DOI: 10.1016/j.critrevonc.2025.104780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 05/09/2025] [Accepted: 05/22/2025] [Indexed: 05/28/2025] Open
Abstract
INTRODUCTION Breast cancer is the most prevalent cancer among women, with growing incidence and mortality rates. Regardless of remarkable progress in cancer research, breast cancer remains a major concern due to its complex nature. These factors underscore the necessity of innovative research and diagnostic tools. Attention to gene signatures and biotechnology methods have shown significant performance in the diagnosis and management of breast cancer. Currently, artificial intelligence (AI) is known as a revolutionary tool to analyze data, identify biomarkers, and enrich diagnostic and prognostic accuracy. Therefore, the integration of breast cancer datasets with artificial intelligence can play a crucial role in the control of breast cancer. This review explores advanced machine learning techniques to analyze transcriptomic data while focusing on breast cancer subtype classification and its potential impact and limitations. METHOD A comprehensive literature search was performed in PubMed, Scopus, WoS, Embase, and IEEE Xplore. Duplicates were removed, two reviewers screened articles, and two additional reviewers resolved conflicts. Data extraction included details on molecular methods, AI techniques, clinical targets, study populations, and data analysis methods which were used to categorize relevant studies into RNA sequencing and gene expression profiling groups. RESULT In the initial stage, 7287 articles were identified, and 54 were retained following further screening, 24 in RNA sequencing and 30 in gene expression profiling. A review of these studies showed how artificial intelligence is advancing breast cancer research by using RNA sequencing and gene expression profiling. AI algorithms, including Random Forest, CNNs, SVMs, and LASSO, were the most applied techniques that showed significant potential to identify biomarkers, prognostic survival, and optimize drug responses to manage breast cancer. CONCLUSION The methods of artificial intelligence hold very great potential for change in the field of breast cancer. This promising progress can be seen in every aspect including diagnosis, prognosis, and treatment. However, it is important to note that we are still in the early stages of progress, and larger-scale studies and interdisciplinary collaborations in this field are needed.
Collapse
Affiliation(s)
- Sahar Rezaei
- Department of Nuclear Medicine, Medical School, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Zeinab Hamedani
- bInternational School of Medicine, Zhejiang University, Zhejiang, China
| | - Kousar Ahmadi
- Department of Anatomy, Faculty of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Parna Ghannadikhosh
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Alireza Motamedi
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Maedeh Athari
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Hengameh Yousefi
- Student Research Committee, School of Medicine, Islamic Azad University, Kerman Branch, Kerman, Iran
| | - Amir Hossein Rajabi
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Alireza Abbasi
- Artificial Intelligence Clinical Laboratory and Biological Data Bank, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Hossein Arabi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, CH-1211, Geneva 4, Switzerland.
| |
Collapse
|
2
|
Nagra AA, Khan AH, Abubakar M, Faheem M, Rasool A, Masood K, Hussain M. A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization. Sci Rep 2024; 14:19613. [PMID: 39179674 PMCID: PMC11343852 DOI: 10.1038/s41598-024-68744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 07/26/2024] [Indexed: 08/26/2024] Open
Abstract
Gene selection is an essential step for the classification of microarray cancer data. Gene expression cancer data (deoxyribonucleic acid microarray] facilitates in computing the robust and concurrent expression of various genes. Particle swarm optimization (PSO) requires simple operators and less number of parameters for tuning the model in gene selection. The selection of a prognostic gene with small redundancy is a great challenge for the researcher as there are a few complications in PSO based selection method. In this research, a new variant of PSO (Self-inertia weight adaptive PSO) has been proposed. In the proposed algorithm, SIW-APSO-ELM is explored to achieve gene selection prediction accuracies. This novel algorithm establishes a balance between the exploitation and exploration capabilities of the improved inertia weight adaptive particle swarm optimization. The self-inertia weight adaptive particle swarm optimization (SIW-APSO) algorithm is employed for solution explorations. Each particle in the SIW-APSO increases its position and velocity iteratively through an evolutionary process. The extreme learning machine (ELM) has been designed for the selection procedure. The proposed method has been employed to identify several genes in the cancer dataset. The classification algorithm contains ELM, K-centroid nearest neighbor, and support vector machine to attain high forecast accuracy as compared to the start-of-the-art methods on microarray cancer datasets that show the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Arfan Ali Nagra
- Faculty of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan
| | - Ali Haider Khan
- Faculty of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan
| | - Muhammad Abubakar
- Faculty of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan.
| | - Muhammad Faheem
- School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Adil Rasool
- Department of Computer, Bakhtar University Kabul, Kabul, Afghanistan.
| | - Khalid Masood
- Faculty of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan
| | - Muzammil Hussain
- Department of Computer Science, University of Central Punjab Pakistan, Lahore, 54000, Pakistan
| |
Collapse
|
3
|
Cao Y, Pi W, Lin CY, Munzner U, Ohtomo M, Akutsu T. Common Attractors in Multiple Boolean Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2862-2873. [PMID: 37079419 DOI: 10.1109/tcbb.2023.3268795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Analyzing multiple networks is important to understand relevant features among different networks. Although many studies have been conducted for that purpose, not much attention has been paid to the analysis of attractors (i.e., steady states) in multiple networks. Therefore, we study common attractors and similar attractors in multiple networks to uncover hidden similarities and differences among networks using Boolean networks (BNs), where BNs have been used as a mathematical model of genetic networks and neural networks. We define three problems on detecting common attractors and similar attractors, and theoretically analyze the expected number of such objects for random BNs, where we assume that given networks have the same set of nodes (i.e., genes). We also present four methods for solving these problems. Computational experiments on randomly generated BNs are performed to demonstrate the efficiency of our proposed methods. In addition, experiments on a practical biological system, a BN model of the TGF- β signaling pathway, are performed. The result suggests that common attractors and similar attractors are useful for exploring tumor heterogeneity and homogeneity in eight cancers.
Collapse
|
4
|
Nasser M, Yusof UK. Deep Learning Based Methods for Breast Cancer Diagnosis: A Systematic Review and Future Direction. Diagnostics (Basel) 2023; 13:diagnostics13010161. [PMID: 36611453 PMCID: PMC9818155 DOI: 10.3390/diagnostics13010161] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 12/19/2022] [Accepted: 12/19/2022] [Indexed: 01/06/2023] Open
Abstract
Breast cancer is one of the precarious conditions that affect women, and a substantive cure has not yet been discovered for it. With the advent of Artificial intelligence (AI), recently, deep learning techniques have been used effectively in breast cancer detection, facilitating early diagnosis and therefore increasing the chances of patients' survival. Compared to classical machine learning techniques, deep learning requires less human intervention for similar feature extraction. This study presents a systematic literature review on the deep learning-based methods for breast cancer detection that can guide practitioners and researchers in understanding the challenges and new trends in the field. Particularly, different deep learning-based methods for breast cancer detection are investigated, focusing on the genomics and histopathological imaging data. The study specifically adopts the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), which offer a detailed analysis and synthesis of the published articles. Several studies were searched and gathered, and after the eligibility screening and quality evaluation, 98 articles were identified. The results of the review indicated that the Convolutional Neural Network (CNN) is the most accurate and extensively used model for breast cancer detection, and the accuracy metrics are the most popular method used for performance evaluation. Moreover, datasets utilized for breast cancer detection and the evaluation metrics are also studied. Finally, the challenges and future research direction in breast cancer detection based on deep learning models are also investigated to help researchers and practitioners acquire in-depth knowledge of and insight into the area.
Collapse
|
5
|
Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O. Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artif Intell Med 2022; 127:102276. [DOI: 10.1016/j.artmed.2022.102276] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 10/18/2021] [Accepted: 03/04/2022] [Indexed: 02/07/2023]
|
6
|
Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data. Sci Rep 2021; 11:20691. [PMID: 34667236 PMCID: PMC8526703 DOI: 10.1038/s41598-021-98814-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 09/14/2021] [Indexed: 02/07/2023] Open
Abstract
Many studies have proven the power of gene expression profile in cancer identification, however, the explosive growth of genomics data increasing needs of tools for cancer diagnosis and prognosis in high accuracy and short times. Here, we collected 6136 human samples from 11 cancer types, and integrated their gene expression profiles and protein-protein interaction (PPI) network to generate 2D images with spectral clustering method. To predict normal samples and 11 cancer tumor types, the images of these 6136 human cancer network were separated into training and validation dataset to develop convolutional neural network (CNN). Our model showed 97.4% and 95.4% accuracies in identification of normal versus tumors and 11 cancer types, respectively. We also provided the results that tumors located in neighboring tissues or in the same cell types, would induce machine make error classification due to the similar gene expression profiles. Furthermore, we observed some patients may exhibit better prognosis if their tumors often misjudged into normal samples. As far as we know, we are the first to generate thousands of cancer networks to predict and classify multiple cancer types with CNN architecture. We believe that our model not only can be applied to cancer diagnosis and prognosis, but also promote the discovery of multiple cancer biomarkers.
Collapse
|
7
|
Li S, Mai Z, Gu W, Ogbuehi AC, Acharya A, Pelekos G, Ning W, Liu X, Deng Y, Li H, Lethaus B, Savkovic V, Zimmerer R, Ziebolz D, Schmalz G, Wang H, Xiao H, Zhao J. Molecular Subtypes of Oral Squamous Cell Carcinoma Based on Immunosuppression Genes Using a Deep Learning Approach. Front Cell Dev Biol 2021; 9:687245. [PMID: 34422810 PMCID: PMC8375681 DOI: 10.3389/fcell.2021.687245] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/04/2021] [Indexed: 12/21/2022] Open
Abstract
Background: The mechanisms through which immunosuppressed patients bear increased risk and worse survival in oral squamous cell carcinoma (OSCC) are unclear. Here, we used deep learning to investigate the genetic mechanisms underlying immunosuppression in the survival of OSCC patients, especially from the aspect of various survival-related subtypes. Materials and methods: OSCC samples data were obtained from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and OSCC-related genetic datasets with survival data in the National Center for Biotechnology Information (NCBI). Immunosuppression genes (ISGs) were obtained from the HisgAtlas and DisGeNET databases. Survival analyses were performed to identify the ISGs with significant prognostic values in OSCC. A deep learning (DL)-based model was established for robustly differentiating the survival subpopulations of OSCC samples. In order to understand the characteristics of the different survival-risk subtypes of OSCC samples, differential expression analysis and functional enrichment analysis were performed. Results: A total of 317 OSCC samples were divided into one inferring cohort (TCGA) and four confirmation cohorts (ICGC set, GSE41613, GSE42743, and GSE75538). Eleven ISGs (i.e., BGLAP, CALCA, CTLA4, CXCL8, FGFR3, HPRT1, IL22, ORMDL3, TLR3, SPHK1, and INHBB) showed prognostic value in OSCC. The DL-based model provided two optimal subgroups of TCGA-OSCC samples with significant differences (p = 4.91E-22) and good model fitness [concordance index (C-index) = 0.77]. The DL model was validated by using four external confirmation cohorts: ICGC cohort (n = 40, C-index = 0.39), GSE41613 dataset (n = 97, C-index = 0.86), GSE42743 dataset (n = 71, C-index = 0.87), and GSE75538 dataset (n = 14, C-index = 0.48). Importantly, subtype Sub1 demonstrated a lower probability of survival and thus a more aggressive nature compared with subtype Sub2. ISGs in subtype Sub1 were enriched in the tumor-infiltrating immune cells-related pathways and cancer progression-related pathways, while those in subtype Sub2 were enriched in the metabolism-related pathways. Conclusion: The two survival subtypes of OSCC identified by deep learning can benefit clinical practitioners to divide immunocompromised patients with oral cancer into two subpopulations and give them target drugs and thus might be helpful for improving the survival of these patients and providing novel therapeutic strategies in the precision medicine area.
Collapse
Affiliation(s)
- Simin Li
- Stomatological Hospital, Southern Medical University, Guangzhou, China
| | - Zhaoyi Mai
- Stomatological Hospital, Southern Medical University, Guangzhou, China
| | - Wenli Gu
- Stomatological Hospital, Southern Medical University, Guangzhou, China
| | | | - Aneesha Acharya
- Dr. D. Y. Patil Dental College and Hospital, Dr. D. Y. Patil Vidyapeeth, Pune, India
| | - George Pelekos
- Faculty of Dentistry, University of Hong Kong, Hong Kong, China
| | - Wanchen Ning
- Stomatological Hospital, Southern Medical University, Guangzhou, China
| | - Xiangqiong Liu
- Laboratory of Molecular Cell Biology, Beijing Tibetan Hospital, China Tibetology Research Center, Beijing, China
| | - Yupei Deng
- Laboratory of Molecular Cell Biology, Beijing Tibetan Hospital, China Tibetology Research Center, Beijing, China
| | - Hanluo Li
- Department of Cranio Maxillofacial Surgery, University Clinic Leipzig, Leipzig, Germany
| | - Bernd Lethaus
- Department of Cranio Maxillofacial Surgery, University Clinic Leipzig, Leipzig, Germany
| | - Vuk Savkovic
- Department of Cranio Maxillofacial Surgery, University Clinic Leipzig, Leipzig, Germany
| | - Rüdiger Zimmerer
- Department of Cranio Maxillofacial Surgery, University Clinic Leipzig, Leipzig, Germany
| | - Dirk Ziebolz
- Department of Cariology, Endodontology and Periodontology, University of Leipzig, Leipzig, Germany
| | - Gerhard Schmalz
- Department of Cariology, Endodontology and Periodontology, University of Leipzig, Leipzig, Germany
| | - Hao Wang
- Shanghai Tenth People’s Hospital, Tongji University, Shanghai, China
| | - Hui Xiao
- Stomatological Hospital, Southern Medical University, Guangzhou, China
| | - Jianjiang Zhao
- Shenzhen Stomatological Hospital, Southern Medical University, Shenzhen, China
| |
Collapse
|
8
|
Cheng Z, Liu L, Lin G, Yi C, Chu X, Liang Y, Zhou W, Jin X. ReHiC: Enhancing Hi-C data resolution via residual convolutional network. J Bioinform Comput Biol 2021; 19:2150001. [PMID: 33685371 DOI: 10.1142/s0219720021500013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
High-throughput chromosome conformation capture (Hi-C) is one of the most popular methods for studying the three-dimensional organization of genomes. However, Hi-C protocols can be expensive since they require large amounts of sample material and may be time-consuming. Most commonly used Hi-C data are low-resolution. Such data can only be used to identify large-scale genomic interactions and are not sufficient to identify the small-scale patterns. We propose a novel deep learning-based computational approach (named ReHiC) that enhances the resolution of Hi-C data and allows us to achieve high-resolution Hi-C data at a relatively low cost. Our model only requires 1/16 down-sampling ratio of the original sequence reading to predict higher resolution Hi-C data. This is very close to high-resolution data in terms of numerical distribution and interaction distribution. More importantly, our framework stacks deeper and converges faster due to residual blocks in the core of the network. Extensive experiments show that ReHiC performs better than HiCPlus and HiCNN, two recently developed and frequently used methods to look at the spatial organization of chromatin structure in the cell. Moreover, the portability of our framework verified by extensive experiments shows that the trained model can also enhance the Hi-C matrix of other cell types efficiently. In conclusion, ReHiC offers more accurate high-resolution image reconstruction in a broad field.
Collapse
Affiliation(s)
- Zhe Cheng
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming 650000, China
| | - Guoliang Lin
- State Key Laboratory for Conservation and Utilization of Bio-resource and School of Life Sciences, Yunnan University, Kunming 650000, China
| | - Chao Yi
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Xing Chu
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Yu Liang
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Wei Zhou
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Xin Jin
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| |
Collapse
|
9
|
Nakashima S, Nacher JC, Song J, Akutsu T. An Overview of Bioinformatics Methods for Analyzing Autism Spectrum Disorders. Curr Pharm Des 2020; 25:4552-4559. [PMID: 31713477 DOI: 10.2174/1381612825666191111154837] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 11/07/2019] [Indexed: 02/06/2023]
Abstract
Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders and are well recognized to be biologically heterogeneous in which various factors are associated, including genetic, metabolic, and environmental ones. Despite its high prevalence, only a few drugs have been approved for the treatment of ASD. Therefore, extensive studies have been conducted to identify ASD risk genes and novel drug targets. Since many genes and many other factors are associated with ASD, various bioinformatics methods have also been developed for the analysis of ASD. In this paper, we review bioinformatics methods for analyzing ASD data with the focus on computational aspects. We classify existing methods into two categories: (i) methods based on genomic variants and gene expression data, and (ii) methods using biological networks, which include gene co-expression networks and protein-protein interaction networks. Next, for each method, we provide an overall flow and elaborate on the computational techniques used. We also briefly review other approaches and discuss possible future directions and strategies for developing bioinformatics approaches to analyze ASD.
Collapse
Affiliation(s)
- Shogo Nakashima
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Jose C Nacher
- Department of Information Science, Faculty of Science, Toho University, Kyoto, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Clayton VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
10
|
Chu YW, Lee YH. Selected Papers from the International Workshop on Cancer Bioinformatics and Intelligent Medicine (CBIM2018). J Bioinform Comput Biol 2019; 17:1902002. [PMID: 31189411 DOI: 10.1142/s0219720019020025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|