1
|
Khosravi M, Jasemi SK, Hayati P, Javar HA, Izadi S, Izadi Z. Transformative artificial intelligence in gastric cancer: Advancements in diagnostic techniques. Comput Biol Med 2024; 183:109261. [PMID: 39488054 DOI: 10.1016/j.compbiomed.2024.109261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/30/2024] [Accepted: 10/07/2024] [Indexed: 11/04/2024]
Abstract
Gastric cancer represents a significant global health challenge with elevated incidence and mortality rates, highlighting the need for advancements in diagnostic and therapeutic strategies. This review paper addresses the critical need for a thorough synthesis of the role of artificial intelligence (AI) in the management of gastric cancer. It provides an in-depth analysis of current AI applications, focusing on their contributions to early diagnosis, treatment planning, and outcome prediction. The review identifies key gaps and limitations in the existing literature by examining recent studies and technological developments. It aims to clarify the evolution of AI-driven methods and their impact on enhancing diagnostic accuracy, personalizing treatment strategies, and improving patient outcomes. The paper emphasizes the transformative potential of AI in overcoming the challenges associated with gastric cancer management and proposes future research directions to further harness AI's capabilities. Through this synthesis, the review underscores the importance of integrating AI technologies into clinical practice to revolutionize gastric cancer management.
Collapse
Affiliation(s)
- Mobina Khosravi
- Student Research Committee, Kermanshah University of Medical Sciences, Kermanshah, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| | - Seyedeh Kimia Jasemi
- Student Research Committee, Kermanshah University of Medical Sciences, Kermanshah, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| | - Parsa Hayati
- Student Research Committee, Kermanshah University of Medical Sciences, Kermanshah, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| | - Hamid Akbari Javar
- Department of Pharmaceutics, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| | - Saadat Izadi
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| | - Zhila Izadi
- Pharmaceutical Sciences Research Center, Health Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran; USERN Office, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| |
Collapse
|
2
|
Gao Y, Cui Y. Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement. Genome Med 2024; 16:76. [PMID: 38835075 PMCID: PMC11149372 DOI: 10.1186/s13073-024-01345-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 05/17/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. METHODS We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. RESULTS Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. CONCLUSIONS This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Yan Cui
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Cancer Research, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
3
|
Hadad E, Rokach L, Veksler-Lublinsky I. Empowering prediction of miRNA-mRNA interactions in species with limited training data through transfer learning. Heliyon 2024; 10:e28000. [PMID: 38560149 PMCID: PMC10981012 DOI: 10.1016/j.heliyon.2024.e28000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
MicroRNAs (miRNAs) play a crucial role in mRNA regulation. Identifying functionally important mRNA targets of a specific miRNA is essential for uncovering its biological function and assisting miRNA-based drug development. Datasets of high-throughput direct bona fide miRNA-target interactions (MTIs) exist only for a few model organisms, prompting the need for computational prediction. However, the scarcity of data poses a challenge in training accurate machine learning models for MTI prediction. In this study, we explored the potential of transfer learning technique (with ANN and XGB) to address the limited data challenge by leveraging the similarities in interaction rules between species. Furthermore, we introduced a novel approach called TransferSHAP for estimating the feature importance of transfer learning in tabular dataset tasks. We demonstrated that transfer learning improves MTI prediction accuracy for species with limited datasets and identified the specific interaction features the models employed to transfer information across different species.
Collapse
Affiliation(s)
- Eyal Hadad
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| | - Lior Rokach
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| | - Isana Veksler-Lublinsky
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, David Ben-Gurion Blvd. 1, Beer-Sheva 8410501, Israel
| |
Collapse
|
4
|
Gouzou D, Taimori A, Haloubi T, Finlayson N, Wang Q, Hopgood JR, Vallejo M. Applications of machine learning in time-domain fluorescence lifetime imaging: a review. Methods Appl Fluoresc 2024; 12:022001. [PMID: 38055998 PMCID: PMC10851337 DOI: 10.1088/2050-6120/ad12f7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/25/2023] [Accepted: 12/06/2023] [Indexed: 12/08/2023]
Abstract
Many medical imaging modalities have benefited from recent advances in Machine Learning (ML), specifically in deep learning, such as neural networks. Computers can be trained to investigate and enhance medical imaging methods without using valuable human resources. In recent years, Fluorescence Lifetime Imaging (FLIm) has received increasing attention from the ML community. FLIm goes beyond conventional spectral imaging, providing additional lifetime information, and could lead to optical histopathology supporting real-time diagnostics. However, most current studies do not use the full potential of machine/deep learning models. As a developing image modality, FLIm data are not easily obtainable, which, coupled with an absence of standardisation, is pushing back the research to develop models which could advance automated diagnosis and help promote FLIm. In this paper, we describe recent developments that improve FLIm image quality, specifically time-domain systems, and we summarise sensing, signal-to-noise analysis and the advances in registration and low-level tracking. We review the two main applications of ML for FLIm: lifetime estimation and image analysis through classification and segmentation. We suggest a course of action to improve the quality of ML studies applied to FLIm. Our final goal is to promote FLIm and attract more ML practitioners to explore the potential of lifetime imaging.
Collapse
Affiliation(s)
- Dorian Gouzou
- Dorian Gouzou and Marta Vallejo are with Institute of Signals, Sensors and Systems, School of Engineering and Physical Sciences, Heriot Watt University, Edinburgh, EH14 4AS, United Kingdom
| | - Ali Taimori
- Tarek Haloubi, Ali Taimori, and James R. Hopgood are with Institute for Imaging, Data and Communication, School of Engineering, University of Edinburgh, Edinburgh, EH9 3FG, United Kingdom
| | - Tarek Haloubi
- Tarek Haloubi, Ali Taimori, and James R. Hopgood are with Institute for Imaging, Data and Communication, School of Engineering, University of Edinburgh, Edinburgh, EH9 3FG, United Kingdom
| | - Neil Finlayson
- Neil Finlayson is with Institute for Integrated Micro and Nano Systems, School of Engineering, University ofEdinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Qiang Wang
- Qiang Wang is with Centre for Inflammation Research, University of Edinburgh, Edinburgh, EH16 4TJ, United Kingdom
| | - James R Hopgood
- Tarek Haloubi, Ali Taimori, and James R. Hopgood are with Institute for Imaging, Data and Communication, School of Engineering, University of Edinburgh, Edinburgh, EH9 3FG, United Kingdom
| | - Marta Vallejo
- Dorian Gouzou and Marta Vallejo are with Institute of Signals, Sensors and Systems, School of Engineering and Physical Sciences, Heriot Watt University, Edinburgh, EH14 4AS, United Kingdom
| |
Collapse
|
5
|
Sheng M, Qi Y, Gao Z, Lin X. Analyzing omics data based on sample network. J Bioinform Comput Biol 2024; 22:2450002. [PMID: 38567387 DOI: 10.1142/s0219720024500021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Identifying valuable features from complex omics data is of great significance for disease diagnosis study. This paper proposes a new feature selection algorithm based on sample network (FS-SN) to mine important information from omics data. The sample network is constructed according to the sample neighbor relationship at the molecular (feature) expression level, and the distinguishing ability of the feature is evaluated based on the topology of the sample network. The sample network established on a feature with a strong discriminating ability tends to have many edges between the same group samples and few edges between the different group samples. At the same time, FS-SN removes redundant features according to the gravitational interaction between features. To show the validation of FS-SN, it was compared on ten public datasets with ERGS, mRMR, ReliefF, ATSD-DN, and INDEED which are efficient in omics data analysis. Experimental results show that FS-SN performed better than the compared methods in accuracy, sensitivity and specificity in most cases. Hence, FS-SN making use of the topology of the sample network is effective for analyzing omics data, it can identify key features that reflect the occurrence and development of diseases, and reveal the underlying biological mechanism.
Collapse
Affiliation(s)
- Meizhen Sheng
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Yanpeng Qi
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Zhenbo Gao
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| |
Collapse
|
6
|
Zhong NN, Wang HQ, Huang XY, Li ZZ, Cao LM, Huo FY, Liu B, Bu LL. Enhancing head and neck tumor management with artificial intelligence: Integration and perspectives. Semin Cancer Biol 2023; 95:52-74. [PMID: 37473825 DOI: 10.1016/j.semcancer.2023.07.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/11/2023] [Accepted: 07/15/2023] [Indexed: 07/22/2023]
Abstract
Head and neck tumors (HNTs) constitute a multifaceted ensemble of pathologies that primarily involve regions such as the oral cavity, pharynx, and nasal cavity. The intricate anatomical structure of these regions poses considerable challenges to efficacious treatment strategies. Despite the availability of myriad treatment modalities, the overall therapeutic efficacy for HNTs continues to remain subdued. In recent years, the deployment of artificial intelligence (AI) in healthcare practices has garnered noteworthy attention. AI modalities, inclusive of machine learning (ML), neural networks (NNs), and deep learning (DL), when amalgamated into the holistic management of HNTs, promise to augment the precision, safety, and efficacy of treatment regimens. The integration of AI within HNT management is intricately intertwined with domains such as medical imaging, bioinformatics, and medical robotics. This article intends to scrutinize the cutting-edge advancements and prospective applications of AI in the realm of HNTs, elucidating AI's indispensable role in prevention, diagnosis, treatment, prognostication, research, and inter-sectoral integration. The overarching objective is to stimulate scholarly discourse and invigorate insights among medical practitioners and researchers to propel further exploration, thereby facilitating superior therapeutic alternatives for patients.
Collapse
Affiliation(s)
- Nian-Nian Zhong
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Han-Qi Wang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Xin-Yue Huang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Zi-Zhan Li
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Lei-Ming Cao
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Fang-Yi Huo
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Bing Liu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China; Department of Oral & Maxillofacial - Head Neck Oncology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China.
| | - Lin-Lin Bu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China; Department of Oral & Maxillofacial - Head Neck Oncology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China.
| |
Collapse
|
7
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
8
|
Keshta I, Deshpande PS, Shabaz M, Soni M, Bhadla MK, Muhammed Y. RETRACTED ARTICLE: Multi-stage biomedical
feature selection extraction algorithm for cancer detection. SN APPLIED SCIENCES 2023; 5:131. [DOI: 10.1007/s42452-023-05339-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 03/13/2023] [Indexed: 12/09/2024] Open
Abstract
AbstractCancer is a significant cause of death worldwide. Early cancer detection is
greatly aided by machine learning and artificial intelligence (AI) to gene microarray
data sets (microarray data). Despite this, there is a significant discrepancy between
the number of gene features in the microarray data set and the number of samples.
Because of this, it is crucial to identify markers for gene array data. Existing feature
selection algorithms, however, generally use long-standing, are limited to
single-condition feature selection and rarely take feature extraction into account. This
work proposes a Multi-stage algorithm for Biomedical Deep Feature Selection (MBDFS) to
address this issue. In the first, three feature selection techniques are combined for
thorough feature selection, and feature subsets are obtained; in the second, an
unsupervised neural network is used to create the best representation of the feature
subset to enhance final classification accuracy. Using a variety of metrics, including a
comparison of classification results before and after feature selection and the
performance of alternative feature selection methods, we evaluate MBDFS's efficacy. The
experiments demonstrate that although MBDFS uses fewer features, classification accuracy
is either unchanged or enhanced.
Collapse
|
9
|
Li S, Zhang L, Tony Cai T, Li H. Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer. J Am Stat Assoc 2023; 119:1274-1285. [PMID: 38948492 PMCID: PMC11213555 DOI: 10.1080/01621459.2023.2184373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 02/15/2023] [Indexed: 03/06/2023]
Abstract
Transfer learning provides a powerful tool for incorporating data from related studies into a target study of interest. In epidemiology and medical studies, the classification of a target disease could borrow information across other related diseases and populations. In this work, we consider transfer learning for high-dimensional generalized linear models (GLMs). A novel algorithm, TransHDGLM, that integrates data from the target study and the source studies is proposed. Minimax rate of convergence for estimation is established and the proposed estimator is shown to be rate-optimal. Statistical inference for the target regression coefficients is also studied. Asymptotic normality for a debiased estimator is established, which can be used for constructing coordinate-wise confidence intervals of the regression coefficients. Numerical studies show significant improvement in estimation and inference accuracy over GLMs that only use the target data. The proposed methods are applied to a real data study concerning the classification of colorectal cancer using gut microbiomes, and are shown to enhance the classification accuracy in comparison to methods that only use the target data.
Collapse
Affiliation(s)
- Sai Li
- Institute of Statistics and Big Data, Renmin University of China, China
| | - Linjun Zhang
- Department of Statistics, Rutgers University, New Brunswick, NJ 08854
| | - T Tony Cai
- Department of Statistics, the Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
10
|
Latyshev P, Pavlov F, Herbert A, Poptsova M. Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals. Front Big Data 2023; 6:1140663. [PMID: 37063486 PMCID: PMC10101332 DOI: 10.3389/fdata.2023.1140663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 03/14/2023] [Indexed: 04/03/2023] Open
Abstract
Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
Collapse
Affiliation(s)
- Pavel Latyshev
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Fedor Pavlov
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Alan Herbert
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
- InsideOutBio, Charlestown, MA, United States
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| |
Collapse
|
11
|
Keerthana R, Gladston A, Nehemiah HK. Transfer learning-based CNN diagnostic framework for diagnosis of COVID-19 from lung CT images. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2023.2170768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Affiliation(s)
- R. Keerthana
- Department of Computer Science and Engineering, Anna University Chennai, Chennai, India
| | - Angelin Gladston
- Department of Computer Science and Engineering, Anna University Chennai, Chennai, India
| | | |
Collapse
|
12
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
13
|
Lin YC, Lin Y, Huang YL, Ho CY, Chiang HJ, Lu HY, Wang CC, Wang JJ, Ng SH, Lai CH, Lin G. Generalizable transfer learning of automated tumor segmentation from cervical cancers toward a universal model for uterine malignancies in diffusion-weighted MRI. Insights Imaging 2023; 14:14. [PMID: 36690870 PMCID: PMC9871146 DOI: 10.1186/s13244-022-01356-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/04/2022] [Indexed: 01/25/2023] Open
Abstract
PURPOSE To investigate the generalizability of transfer learning (TL) of automated tumor segmentation from cervical cancers toward a universal model for cervical and uterine malignancies in diffusion-weighted magnetic resonance imaging (DWI). METHODS In this retrospective multicenter study, we analyzed pelvic DWI data from 169 and 320 patients with cervical and uterine malignancies and divided them into the training (144 and 256) and testing (25 and 64) datasets, respectively. A pretrained model was established using DeepLab V3 + from the cervical cancer dataset, followed by TL experiments adjusting the training data sizes and fine-tuning layers. The model performance was evaluated using the dice similarity coefficient (DSC). RESULTS In predicting tumor segmentation for all cervical and uterine malignancies, TL models improved the DSCs from the pretrained cervical model (DSC 0.43) when adding 5, 13, 26, and 51 uterine cases for training (DSC improved from 0.57, 0.62, 0.68, 0.70, p < 0.001). Following the crossover at adding 128 cases (DSC 0.71), the model trained by combining data from adding all the 256 patients exhibited the highest DSCs for the combined cervical and uterine datasets (DSC 0.81) and cervical only dataset (DSC 0.91). CONCLUSIONS TL may improve the generalizability of automated tumor segmentation of DWI from a specific cancer type toward multiple types of uterine malignancies especially in limited case numbers.
Collapse
Affiliation(s)
- Yu-Chun Lin
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.145695.a0000 0004 1798 0922Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan, 33302 Taiwan ,grid.454210.60000 0004 1756 1461Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Yenpo Lin
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Yen-Ling Huang
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Chih-Yi Ho
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Hsin-Ju Chiang
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.454210.60000 0004 1756 1461Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Hsin-Ying Lu
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.454210.60000 0004 1756 1461Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Chun-Chieh Wang
- grid.145695.a0000 0004 1798 0922Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan, 33302 Taiwan ,grid.145695.a0000 0004 1798 0922Department of Radiation Oncology, Chang Gung Memorial Hospital at Linkou and Chang Gung University, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Jiun-Jie Wang
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.145695.a0000 0004 1798 0922Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taoyuan, 33302 Taiwan
| | - Shu-Hang Ng
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Chyong-Huey Lai
- grid.145695.a0000 0004 1798 0922Gynecologic Cancer Research Center, Department of Obstetrics and Gynecology, Chang Gung Memorial Hospital at Linkou and Chang Gung University, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| | - Gigin Lin
- grid.413801.f0000 0001 0711 0593Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou and Keelung, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.454210.60000 0004 1756 1461Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan ,grid.145695.a0000 0004 1798 0922Gynecologic Cancer Research Center, Department of Obstetrics and Gynecology, Chang Gung Memorial Hospital at Linkou and Chang Gung University, 5 Fuhsing St., Guishan, Taoyuan, 33382 Taiwan
| |
Collapse
|
14
|
He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023; 88:187-200. [PMID: 36596352 DOI: 10.1016/j.semcancer.2022.12.009] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/16/2022] [Accepted: 12/29/2022] [Indexed: 01/02/2023]
Abstract
With biotechnological advancements, innovative omics technologies are constantly emerging that have enabled researchers to access multi-layer information from the genome, epigenome, transcriptome, proteome, metabolome, and more. A wealth of omics technologies, including bulk and single-cell omics approaches, have empowered to characterize different molecular layers at unprecedented scale and resolution, providing a holistic view of tumor behavior. Multi-omics analysis allows systematic interrogation of various molecular information at each biological layer while posing tricky challenges regarding how to extract valuable insights from the exponentially increasing amount of multi-omics data. Therefore, efficient algorithms are needed to reduce the dimensionality of the data while simultaneously dissecting the mysteries behind the complex biological processes of cancer. Artificial intelligence has demonstrated the ability to analyze complementary multi-modal data streams within the oncology realm. The coincident development of multi-omics technologies and artificial intelligence algorithms has fuelled the development of cancer precision medicine. Here, we present state-of-the-art omics technologies and outline a roadmap of multi-omics integration analysis using an artificial intelligence strategy. The advances made using artificial intelligence-based multi-omics approaches are described, especially concerning early cancer screening, diagnosis, response assessment, and prognosis prediction. Finally, we discuss the challenges faced in multi-omics analysis, along with tentative future trends in this field. With the increasing application of artificial intelligence in multi-omics analysis, we anticipate a shifting paradigm in precision medicine becoming driven by artificial intelligence-based multi-omics technologies.
Collapse
Affiliation(s)
- Xiujing He
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Xiaowei Liu
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Fengli Zuo
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Hubing Shi
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Jing Jing
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China.
| |
Collapse
|
15
|
Gupta S, Gupta MK, Shabaz M, Sharma A. Deep learning techniques for cancer classification using microarray gene expression data. Front Physiol 2022; 13:952709. [PMID: 36246115 PMCID: PMC9563992 DOI: 10.3389/fphys.2022.952709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/01/2022] [Indexed: 11/28/2022] Open
Abstract
Cancer is one of the top causes of death globally. Recently, microarray gene expression data has been used to aid in cancer's effective and early detection. The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. The analysis of gene expression is critical in many disciplines of biological study to obtain the necessary information. This study analyses all the research studies focused on optimizing gene selection for cancer detection using artificial intelligence. One of the most challenging issues is figuring out how to extract meaningful information from massive databases. Deep Learning architectures have performed efficiently in numerous sectors and are used to diagnose many other chronic diseases and to assist physicians in making medical decisions. In this study, we have evaluated the results of different optimizers on a RNA sequence dataset. The Deep learning algorithm proposed in the study classifies five different forms of cancer, including kidney renal clear cell carcinoma (KIRC), Breast Invasive Carcinoma (BRCA), lung adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD) and Colon Adenocarcinoma (COAD). The performance of different optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). The experimental results gathered on the dataset affirm that AdaGrad and Adam. Also, the performance analysis has been done using different learning rates and decay rates. This study discusses current advancements in deep learning-based gene expression data analysis using optimized feature selection methods.
Collapse
Affiliation(s)
- Surbhi Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
- Model Institute of Engineering and Technology, Jammu, India
| | - Manoj K. Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
| | | | - Ashutosh Sharma
- School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
16
|
Dodlapati S, Jiang Z, Sun J. Completing Single-Cell DNA Methylome Profiles via Transfer Learning Together With KL-Divergence. Front Genet 2022; 13:910439. [PMID: 35938031 PMCID: PMC9353187 DOI: 10.3389/fgene.2022.910439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/25/2022] [Indexed: 11/13/2022] Open
Abstract
The high level of sparsity in methylome profiles obtained using whole-genome bisulfite sequencing in the case of low biological material amount limits its value in the study of systems in which large samples are difficult to assemble, such as mammalian preimplantation embryonic development. The recently developed computational methods for addressing the sparsity by imputing missing have their limits when the required minimum data coverage or profiles of the same tissue in other modalities are not available. In this study, we explored the use of transfer learning together with Kullback-Leibler (KL) divergence to train predictive models for completing methylome profiles with very low coverage (below 2%). Transfer learning was used to leverage less sparse profiles that are typically available for different tissues for the same species, while KL divergence was employed to maximize the usage of information carried in the input data. A deep neural network was adopted to extract both DNA sequence and local methylation patterns for imputation. Our study of training models for completing methylome profiles of bovine oocytes and early embryos demonstrates the effectiveness of transfer learning and KL divergence, with individual increase of 29.98 and 29.43%, respectively, in prediction performance and 38.70% increase when the two were used together. The drastically increased data coverage (43.80-73.6%) after imputation powers downstream analyses involving methylomes that cannot be effectively done using the very low coverage profiles (0.06-1.47%) before imputation.
Collapse
Affiliation(s)
- Sanjeeva Dodlapati
- Department of Computer Science, Old Dominion University, Norfolk, VA, United States
| | - Zongliang Jiang
- School of Animal Sciences, AgCenter, Louisiana State University, Baton Rouge, LA, United States
| | - Jiangwen Sun
- Department of Computer Science, Old Dominion University, Norfolk, VA, United States
| |
Collapse
|
17
|
Mondol RK, Truong ND, Reza M, Ippolito S, Ebrahimie E, Kavehei O. AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-Types and Extracting Biologically Relevant Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2060-2070. [PMID: 33720833 DOI: 10.1109/tcbb.2021.3066086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Technological advancements in high-throughput genomics enable the generation of complex and large data sets that can be used for classification, clustering, and bio-marker identification. Modern deep learning algorithms provide us with the opportunity of finding most significant features in such huge dataset to characterize diseases (e.g., cancer) and their sub-types. Thus, developing such deep learning method, which can successfully extract meaningful features from various breast cancer sub-types, is of current research interest. In this paper, we develop dual stage (unsupervised pre-training and supervised fine-tuning) neural network architecture termed AFExNet based on adversarial auto-encoder (AAE) to extract features from high dimensional genetic data. We evaluated the performance of our model through twelve different supervised classifiers to verify the usefulness of the new features using public RNA-Seq dataset of breast cancer. AFExNet provides consistent results in all performance metrics across twelve different classifiers which makes our model classifier independent. We also develop a method named 'TopGene' to find highly weighted genes from the latent space which could be useful for finding cancer bio-markers. Put together, AFExNet has great potential for biological data to accurately and effectively extract features. Our work is fully reproducible and source code can be downloaded from Github: https://github.com/NeuroSyd/breast-cancer-sub-types.
Collapse
|
18
|
Trastulla L, Noorbakhsh J, Vazquez F, McFarland J, Iorio F. Computational estimation of quality and clinical relevance of cancer cell lines. Mol Syst Biol 2022; 18:e11017. [PMID: 35822563 PMCID: PMC9277610 DOI: 10.15252/msb.202211017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/10/2022] [Accepted: 06/13/2022] [Indexed: 12/12/2022] Open
Abstract
Immortal cancer cell lines (CCLs) are the most widely used system for investigating cancer biology and for the preclinical development of oncology therapies. Pharmacogenomic and genome-wide editing screenings have facilitated the discovery of clinically relevant gene-drug interactions and novel therapeutic targets via large panels of extensively characterised CCLs. However, tailoring pharmacological strategies in a precision medicine context requires bridging the existing gaps between tumours and in vitro models. Indeed, intrinsic limitations of CCLs such as misidentification, the absence of tumour microenvironment and genetic drift have highlighted the need to identify the most faithful CCLs for each primary tumour while addressing their heterogeneity, with the development of new models where necessary. Here, we discuss the most significant limitations of CCLs in representing patient features, and we review computational methods aiming at systematically evaluating the suitability of CCLs as tumour proxies and identifying the best patient representative in vitro models. Additionally, we provide an overview of the applications of these methods to more complex models and discuss future machine-learning-based directions that could resolve some of the arising discrepancies.
Collapse
Affiliation(s)
| | - Javad Noorbakhsh
- Broad Institute of MIT and HarvardCambridgeMAUSA
- Present address:
Kojin TherapeuticsBostonMAUSA
| | - Francisca Vazquez
- Broad Institute of MIT and HarvardCambridgeMAUSA
- Department of Medical OncologyDana‐Farber Cancer InstituteBostonMAUSA
| | | | | |
Collapse
|
19
|
Gao Y, Cui Y. Clinical time-to-event prediction enhanced by incorporating compatible related outcomes. PLOS DIGITAL HEALTH 2022; 1:e0000038. [PMID: 35757279 PMCID: PMC9222982 DOI: 10.1371/journal.pdig.0000038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 06/15/2023]
Abstract
Accurate time-to-event (TTE) prediction of clinical outcomes from personal biomedical data is essential for precision medicine. It has become increasingly common that clinical datasets contain information for multiple related patient outcomes from comorbid diseases or multifaceted endpoints of a single disease. Various TTE models have been developed to handle competing risks that are related to mutually exclusive events. However, clinical outcomes are often non-competing and can occur at the same time or sequentially. Here we develop TTE prediction models with the capacity of incorporating compatible related clinical outcomes. We test our method on real and synthetic data and find that the incorporation of related auxiliary clinical outcomes can: 1) significantly improve the TTE prediction performance of conventional Cox model while maintaining its interpretability; 2) further improve the performance of the state-of-the-art deep learning based models. While the auxiliary outcomes are utilized for model training, the model deployment is not limited by the availability of the auxiliary outcome data because the auxiliary outcome information is not required for the prediction of the primary outcome once the model is trained.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Yan Cui
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- Center for Cancer Research, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| |
Collapse
|
20
|
Liu Z, Wang R, Zhang W. Improving the generalization of unsupervised feature learning by using data from different sources on gene expression data for cancer diagnosis. Med Biol Eng Comput 2022; 60:1055-1073. [DOI: 10.1007/s11517-022-02522-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 01/30/2022] [Indexed: 10/19/2022]
|
21
|
Prokaryotic and eukaryotic promoters identification based on residual network transfer learning. Bioprocess Biosyst Eng 2022; 45:955-967. [DOI: 10.1007/s00449-022-02716-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/27/2022] [Indexed: 11/26/2022]
|
22
|
High-dimensional role of AI and machine learning in cancer research. Br J Cancer 2022; 126:523-532. [PMID: 35013580 PMCID: PMC8854697 DOI: 10.1038/s41416-021-01689-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 11/23/2021] [Accepted: 12/23/2021] [Indexed: 01/12/2023] Open
Abstract
The role of Artificial Intelligence and Machine Learning in cancer research offers several advantages, primarily scaling up the information processing and increasing the accuracy of the clinical decision-making. The key enabling tools currently in use in Precision, Digital and Translational Medicine, here named as 'Intelligent Systems' (IS), leverage unprecedented data volumes and aim to model their underlying heterogeneous influences and variables correlated with patients' outcomes. As functionality and performance of IS are associated with complex diagnosis and therapy decisions, a rich spectrum of patterns and features detected in high-dimensional data may be critical for inference purposes. Many challenges are also present in such discovery task. First, the generation of interpretable model results from a mix of structured and unstructured input information. Second, the design, and implementation of automated clinical decision processes for drawing disease trajectories and patient profiles. Ultimately, the clinical impacts depend on the data effectively subjected to steps such as harmonisation, integration, validation, etc. The aim of this work is to discuss the transformative value of IS applied to multimodal data acquired through various interrelated cancer domains (high-throughput genomics, experimental biology, medical image processing, radiomics, patient electronic records, etc.).
Collapse
|
23
|
Kakati T, Bhattacharyya DK, Kalita JK, Norden-Krichmar TM. DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinformatics 2022; 23:17. [PMID: 34991439 PMCID: PMC8734099 DOI: 10.1186/s12859-021-04527-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 12/13/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99[Formula: see text]) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.
Collapse
Affiliation(s)
- Tulika Kakati
- Department of Epidemiology and Biostatistics, University of California, Irvine, Irvine, CA, USA.,Department of Computer Science, Tezpur University, Assam, India
| | | | - Jugal K Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, Colorado Springs, CO, USA
| | - Trina M Norden-Krichmar
- Department of Epidemiology and Biostatistics, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
24
|
Mahapatra S, Gupta VR, Sahu SS, Panda G. Deep Neural Network and Extreme Gradient Boosting Based Hybrid Classifier for Improved Prediction of Protein-Protein Interaction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:155-165. [PMID: 33621179 DOI: 10.1109/tcbb.2021.3061300] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Understanding the behavioral process of life and disease-causing mechanism, knowledge regarding protein-protein interactions (PPI) is essential. In this paper, a novel hybrid approach combining deep neural network (DNN) and extreme gradient boosting classifier (XGB) is employed for predicting PPI. The hybrid classifier (DNN-XGB) uses a fusion of three sequence-based features, amino acid composition (AAC), conjoint triad composition (CT), and local descriptor (LD) as inputs. The DNN extracts the hidden information through a layer-wise abstraction from the raw features that are passed through the XGB classifier. The 5-fold cross-validation accuracy for intraspecies interactions dataset of Saccharomyces cerevisiae (core subset), Helicobacter pylori, Saccharomyces cerevisiae, and Human are 98.35, 96.19, 97.37, and 99.74 percent respectively. Similarly, accuracies of 98.50 and 97.25 percent are achieved for interspecies interaction dataset of Human- Bacillus Anthracis and Human- Yersinia pestis datasets, respectively. The improved prediction accuracies obtained on the independent test sets and network datasets indicate that the DNN-XGB can be used to predict cross-species interactions. It can also provide new insights into signaling pathway analysis, predicting drug targets, and understanding disease pathogenesis. Improved performance of the proposed method suggests that the hybrid classifier can be used as a useful tool for PPI prediction. The datasets and source codes are available at: https://github.com/SatyajitECE/DNN-XGB-for-PPI-Prediction.
Collapse
|
25
|
Kedra J, Davergne T, Braithwaite B, Servy H, Gossec L. Machine learning approaches to improve disease management of patients with rheumatoid arthritis: review and future directions. Expert Rev Clin Immunol 2021; 17:1311-1321. [PMID: 34890271 DOI: 10.1080/1744666x.2022.2017773] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Although the management of rheumatoid arthritis (RA) has improved in major way over the last decades, this disease still leads to an important burden for patients and society, and there is a need to develop more personalized approaches. Machine learning (ML) methods are more and more used in health-related studies and can be applied to different sorts of data (clinical, radiological, or 'omics' data). Such approaches may improve the management of patients with RA. AREAS COVERED In this paper, we propose a review regarding ML approaches applied to RA. A scoping literature search was performed in PubMed, in September 2021 using the following MeSH terms: 'arthritis, rheumatoid' and 'machine learning'. Based on this search, the usefulness of ML methods for RA diagnosis, monitoring, and prediction of response to treatment and RA outcomes, is discussed. EXPERT OPINION ML methods have the potential to revolutionize RA-related research and improve disease management and patient care. Nevertheless, these models are not yet ready to contribute fully to rheumatologists' daily practice. Indeed, these methods raise technical, methodological, and ethical issues, which should be addressed properly to allow their implementation. Collaboration between data scientists, clinical researchers, and physicians is therefore required to move this field forward.
Collapse
Affiliation(s)
- Joanna Kedra
- Sorbonne Université, INSERM, Institut Pierre Louis d'Epidémiologie et de Santé Publique, Paris, France.,Rheumatology Department, Pitié-Salpêtrière Hospital, AP-HP, Paris, France
| | - Thomas Davergne
- Sorbonne Université, INSERM, Institut Pierre Louis d'Epidémiologie et de Santé Publique, Paris, France
| | | | | | - Laure Gossec
- Sorbonne Université, INSERM, Institut Pierre Louis d'Epidémiologie et de Santé Publique, Paris, France.,Rheumatology Department, Pitié-Salpêtrière Hospital, AP-HP, Paris, France
| |
Collapse
|
26
|
Gao K, Fan Z, Su J, Zeng LL, Shen H, Zhu J, Hu D. Deep Transfer Learning for Cerebral Cortex Using Area-Preserving Geometry Mapping. Cereb Cortex 2021; 32:2972-2984. [PMID: 34791082 DOI: 10.1093/cercor/bhab394] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 09/01/2021] [Accepted: 10/03/2021] [Indexed: 01/17/2023] Open
Abstract
Limited sample size hinders the application of deep learning in brain image analysis, and transfer learning is a possible solution. However, most pretrained models are 2D based and cannot be applied directly to 3D brain images. In this study, we propose a novel framework to apply 2D pretrained models to 3D brain images by projecting surface-based cortical morphometry into planar images using computational geometry mapping. Firstly, 3D cortical meshes are reconstructed from magnetic resonance imaging (MRI) using FreeSurfer and projected into 2D planar meshes with topological preservation based on area-preserving geometry mapping. Then, 2D deep models pretrained on ImageNet are adopted and fine-tuned for cortical image classification on morphometric shape metrics. We apply the framework to sex classification on the Human Connectome Project dataset and autism spectrum disorder (ASD) classification on the Autism Brain Imaging Data Exchange dataset. Moreover, a 2-stage transfer learning strategy is suggested to boost the ASD classification performance by using the sex classification as an intermediate task. Our framework brings significant improvement in sex classification and ASD classification with transfer learning. In summary, the proposed framework builds a bridge between 3D cortical data and 2D models, making 2D pretrained models available for brain image analysis in cognitive and psychiatric neuroscience.
Collapse
Affiliation(s)
- Kai Gao
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
| | - Zhipeng Fan
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
| | - Jianpo Su
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
| | - Ling-Li Zeng
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
| | - Hui Shen
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
| | - Jubo Zhu
- College of Science, National University of Defense Technology, Changsha 410073, China
| | - Dewen Hu
- College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.,Pazhou Lab, Guangzhou 510330, China
| |
Collapse
|
27
|
Abstract
As bridge inspection becomes more advanced and more ubiquitous, artificial intelligence (AI) techniques, such as machine and deep learning, could offer suitable solutions to the nation’s problems of overdue bridge inspections. AI coupling with various data that can be captured by unmanned aerial vehicles (UAVs) enables fully automated bridge inspections. The key to the success of automated bridge inspection is a model capable of detecting failures from UAV data like images and films. In this context, this paper investigates the performances of state-of-the-art convolutional neural networks (CNNs) through transfer learning for crack detection in UAV-based bridge inspection. The performance of different CNN models is evaluated via UAV-based inspection of Skodsberg Bridge, located in eastern Norway. The low-level features are extracted in the last layers of the CNN models and these layers are trained using 19,023 crack and non-crack images. There is always a trade-off between the number of trainable parameters that CNN models need to learn for each specific task and the number of non-trainable parameters that come from transfer learning. Therefore, selecting the optimized amount of transfer learning is a challenging task and, as there is not enough research in this area, it will be studied in this paper. Moreover, UAV-based bridge inception images require specific attention to establish a suitable dataset as the input of CNN models that are trained on homogenous images. However, in the real implementation of CNN models in UAV-based bridge inspection images, there are always heterogeneities and noises, such as natural and artificial effects like different luminosities, spatial positions, and colors of the elements in an image. In this study, the effects of such heterogeneities on the performance of CNN models via transfer learning are examined. The results demonstrate that with a simplified image cropping technique and with minimum effort to preprocess images, CNN models can identify crack elements from non-crack elements with 81% accuracy. Moreover, the results show that heterogeneities inherent in UAV-based bridge inspection data significantly affect the performance of CNN models with an average 32.6% decrease of accuracy of the CNN models. It is also found that deeper CNN models do not provide higher accuracy compared to the shallower CNN models when the number of images for adoption to a specific task, in this case crack detection, is not large enough; in this study, 19,023 images and shallower models outperform the deeper models.
Collapse
|
28
|
Sarker IH. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. ACTA ACUST UNITED AC 2021; 2:420. [PMID: 34426802 PMCID: PMC8372231 DOI: 10.1007/s42979-021-00815-1] [Citation(s) in RCA: 357] [Impact Index Per Article: 89.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 08/07/2021] [Indexed: 11/26/2022]
Abstract
Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various application areas like healthcare, visual recognition, text analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding turns DL methods into black-box machines that hamper development at the standard level. This article presents a structured and comprehensive view on DL techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our taxonomy, we take into account deep networks for supervised or discriminative learning, unsupervised or generative learning as well as hybrid learning and relevant others. We also summarize real-world application areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future generation DL modeling with research directions. Overall, this article aims to draw a big picture on DL modeling that can be used as a reference guide for both academia and industry professionals.
Collapse
Affiliation(s)
- Iqbal H. Sarker
- Swinburne University of Technology, Melbourne, VIC 3122 Australia
- Chittagong University of Engineering & Technology, Chittagong, 4349 Bangladesh
| |
Collapse
|
29
|
Neuromodulated Dopamine Plastic Networks for Heterogeneous Transfer Learning with Hebbian Principle. Symmetry (Basel) 2021. [DOI: 10.3390/sym13081344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The plastic modifications in synaptic connectivity is primarily from changes triggered by neuromodulated dopamine signals. These activities are controlled by neuromodulation, which is itself under the control of the brain. The subjective brain’s self-modifying abilities play an essential role in learning and adaptation. The artificial neural networks with neuromodulated plasticity are used to implement transfer learning in the image classification domain. In particular, this has application in image detection, image segmentation, and transfer of learning parameters with significant results. This paper proposes a novel approach to enhance transfer learning accuracy in a heterogeneous source and target, using the neuromodulation of the Hebbian learning principle, called NDHTL (Neuromodulated Dopamine Hebbian Transfer Learning). Neuromodulation of plasticity offers a powerful new technique with applications in training neural networks implementing asymmetric backpropagation using Hebbian principles in transfer learning motivated CNNs (Convolutional neural networks). Biologically motivated concomitant learning, where connected brain cells activate positively, enhances the synaptic connection strength between the network neurons. Using the NDHTL algorithm, the percentage of change of the plasticity between the neurons of the CNN layer is directly managed by the dopamine signal’s value. The discriminative nature of transfer learning fits well with the technique. The learned model’s connection weights must adapt to unseen target datasets with the least cost and effort in transfer learning. Using distinctive learning principles such as dopamine Hebbian learning in transfer learning for asymmetric gradient weights update is a novel approach. The paper emphasizes the NDHTL algorithmic technique as synaptic plasticity controlled by dopamine signals in transfer learning to classify images using source-target datasets. The standard transfer learning using gradient backpropagation is a symmetric framework. Experimental results using CIFAR-10 and CIFAR-100 datasets show that the proposed NDHTL algorithm can enhance transfer learning efficiency compared to existing methods.
Collapse
|
30
|
Zhou K, Arslanturk S, Craig DB, Heath E, Draghici S. Discovery of primary prostate cancer biomarkers using cross cancer learning. Sci Rep 2021; 11:10433. [PMID: 34001952 PMCID: PMC8128891 DOI: 10.1038/s41598-021-89789-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/30/2021] [Indexed: 02/03/2023] Open
Abstract
Prostate cancer (PCa), the second leading cause of cancer death in American men, is a relatively slow-growing malignancy with multiple early treatment options. Yet, a significant number of low-risk PCa patients are over-diagnosed and over-treated with significant and long-term quality of life effects. Further, there is ever increasing evidence of metastasis and higher mortality when hormone-sensitive or castration-resistant PCa tumors are treated indistinctively. Hence, the critical need is to discover clinically-relevant and actionable PCa biomarkers by better understanding the biology of PCa. In this paper, we have discovered novel biomarkers of PCa tumors through cross-cancer learning by leveraging the pathological and molecular similarities in the DNA repair pathways of ovarian, prostate, and breast cancer tumors. Cross-cancer disease learning enriches the study population and identifies genetic/phenotypic commonalities that are important across diseases with pathological and molecular similarities. Our results show that ADIRF, SLC2A5, C3orf86, HSPA1B are among the most significant PCa biomarkers, while MTRNR2L1, EEPD1, TEPP and VN1R2 are jointly important biomarkers across prostate, breast and ovarian cancers. Our validation results have further shown that the discovered biomarkers can predict the disease state better than any randomly selected subset of differentially expressed prostate cancer genes.
Collapse
Affiliation(s)
- Kaiyue Zhou
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48201, USA.
| | - Douglas B Craig
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Bioinformatics and Biostatistics Core, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Elisabeth Heath
- Department of Oncology, Wayne State University, Detroit, 48201, USA
- Molecular Therapeutics Program, Barbara Ann Karmanos Cancer Institute, Detroit, 48201, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48201, USA
| |
Collapse
|
31
|
Jaya Ant lion optimization-driven Deep recurrent neural network for cancer classification using gene expression data. Med Biol Eng Comput 2021; 59:1005-1021. [PMID: 33851321 DOI: 10.1007/s11517-021-02350-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 03/17/2021] [Indexed: 10/21/2022]
Abstract
Cancer is one of the deadly diseases prevailing worldwide and the patients with cancer are rescued only when the cancer is detected at the very early stage. Early detection of cancer is essential as, in the final stage, the chance of survival is limited. The symptoms of cancers are rigorous and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying cancer as malignant or benign. Hence, this paper introduces the novel strategy based on the JayaAnt lion optimization-based Deep recurrent neural network (JayaALO-based DeepRNN) for cancer classification. The steps followed in the developed model are data normalization, data transformation, feature dimension detection, and classification. The first step is data normalization. The goal of data normalization is to eliminate data redundancy and to mitigate the storage of objects in a relational database that maintains the same information in several places. After that, the data transformation is carried out based on log transformation that generates the patterns using more interpretable and helps fulfill the supposition, and to reduce skew. Also, the non-negative matrix factorization is employed for reducing the feature dimension. Finally, the proposed JayaALO-based DeepRNN method effectively classifies cancer based on the reduced dimension features to produce a satisfactory result. Thus, the resulted output of the proposed JayaALO-based DeepRNN is employed for cancer classification. The proposed JayaALO-based DeepRNN showed improved results with maximal accuracy of 95.97%, maximal sensitivity of 95.95%, and maximal specificity of 96.96%. The goal of this research is to devise the cancer classification strategy using the proposed JayaALO-based DeepRNN. It is required to detect the cancer at an early stage to prevent the destruction caused to the other organs. The developed model involves four phases to perform the cancer classification, namely data normalization, data transformation, feature dimension detection, and the classification. Initially, the input images are gathered and are adapted to perform data normalization. The normalized data is fed to the data transformation, which will be performed using log transformation. The obtained transformed data is fed to feature dimension reduction which is performed using non-negative matrix factorization. The reduced features will be employed in DeepRNN for cancer classification. The training of DeepRNN is done using the proposed JayaALO, which is designed by combining ALO and the Jaya algorithm the block diagram of the proposed cancer classification approach using JayaALO-based DeepRNN approach is given below.
Collapse
|
32
|
Wang Z, Liu J, Chen X, Li G, Han H. Sparse self-attention aggregation networks for neural sequence slice interpolation. BioData Min 2021; 14:10. [PMID: 33522940 PMCID: PMC7852179 DOI: 10.1186/s13040-021-00236-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/05/2021] [Indexed: 11/10/2022] Open
Abstract
Background Microscopic imaging is a crucial technology for visualizing neural and tissue structures. Large-area defects inevitably occur during the imaging process of electron microscope (EM) serial slices, which lead to reduced registration and semantic segmentation, and affect the accuracy of 3D reconstruction. The continuity of biological tissue among serial EM images makes it possible to recover missing tissues utilizing inter-slice interpolation. However, large deformation, noise, and blur among EM images remain the task challenging. Existing flow-based and kernel-based methods have to perform frame interpolation on images with little noise and low blur. They also cannot effectively deal with large deformations on EM images. Results In this paper, we propose a sparse self-attention aggregation network to synthesize pixels following the continuity of biological tissue. First, we develop an attention-aware layer for consecutive EM images interpolation that implicitly adopts global perceptual deformation. Second, we present an adaptive style-balance loss taking the style differences of serial EM images such as blur and noise into consideration. Guided by the attention-aware module, adaptively synthesizing each pixel aggregated from the global domain further improves the performance of pixel synthesis. Quantitative and qualitative experiments show that the proposed method is superior to the state-of-the-art approaches. Conclusions The proposed method can be considered as an effective strategy to model the relationship between each pixel and other pixels from the global domain. This approach improves the algorithm’s robustness to noise and large deformation, and can accurately predict the effective information of the missing region, which will greatly promote the data analysis of neurobiological research.
Collapse
Affiliation(s)
- Zejin Wang
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100190, China
| | - Jing Liu
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100190, China
| | - Xi Chen
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China
| | - Guoqing Li
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China.
| | - Hua Han
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China. .,Center for Excellence in Brain Science and Intelligence Technology Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China. .,School of Future Technology, University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100190, China.
| |
Collapse
|
33
|
Xia K, Ni T, Yin H, Chen B. Cross-Domain Classification Model With Knowledge Utilization Maximization for Recognition of Epileptic EEG Signals. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:53-61. [PMID: 32078557 DOI: 10.1109/tcbb.2020.2973978] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Conventional classification models for epileptic EEG signal recognition need sufficient labeled samples as training dataset. In addition, when training and testing EEG signal samples are collected from different distributions, for example, due to differences in patient groups or acquisition devices, such methods generally cannot perform well. In this paper, a cross-domain classification model with knowledge utilization maximization called CDC-KUM is presented, which takes advantage of the data global structure provided by the labeled samples in the related domain and unlabeled samples in the current domain. Through mapping the data into kernel space, the pairwise constraint regularization term is combined together the predictive differences of the labeled data in the source domain. Meanwhile, the soft clustering regularization term using quadratic weights and Gini-Simpson diversity is applied to exploit the distribution information of unlabeled data in the target domain. Experimental results show that CDC-KUM model outperformed several traditional non-transfer and transfer classification methods for recognition of epileptic EEG signals.
Collapse
|
34
|
Mignone P, Pio G, Džeroski S, Ceci M. Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep 2020; 10:22295. [PMID: 33339842 PMCID: PMC7749184 DOI: 10.1038/s41598-020-78033-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/29/2020] [Indexed: 12/31/2022] Open
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negative examples are available. In this paper we propose a multi-task method that is able to simultaneously reconstruct the human and the mouse GRNs using the similarities between the two. This is done by exploiting, in a transfer learning approach, possible dependencies that may exist among them. Simultaneously, we solve the issues arising from the limited availability of examples of links by relying on a novel clustering-based approach, able to estimate the degree of certainty of unlabeled examples of links, so that they can be exploited during the training together with the labeled examples. Our experiments show that the proposed method can reconstruct both the human and the mouse GRNs more effectively compared to reconstructing each network separately. Moreover, it significantly outperforms three state-of-the-art transfer learning approaches that, analogously to our method, can exploit the knowledge coming from both organisms. Finally, a specific robustness analysis reveals that, even when the number of labeled examples is very low with respect to the number of unlabeled examples, the proposed method is almost always able to outperform its single-task counterpart.
Collapse
Affiliation(s)
- Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.,Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
35
|
Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nat Commun 2020; 11:5131. [PMID: 33046699 PMCID: PMC7552387 DOI: 10.1038/s41467-020-18918-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 09/16/2020] [Indexed: 12/20/2022] Open
Abstract
As artificial intelligence (AI) is increasingly applied to biomedical research and clinical decisions, developing unbiased AI models that work equally well for all ethnic groups is of crucial importance to health disparity prevention and reduction. However, the biomedical data inequality between different ethnic groups is set to generate new health care disparities through data-driven, algorithm-based biomedical research and clinical decisions. Using an extensive set of machine learning experiments on cancer omics data, we find that current prevalent schemes of multiethnic machine learning are prone to generating significant model performance disparities between ethnic groups. We show that these performance disparities are caused by data inequality and data distribution discrepancies between ethnic groups. We also find that transfer learning can improve machine learning model performance for data-disadvantaged ethnic groups, and thus provides an effective approach to reduce health care disparities arising from data inequality among ethnic groups.
Collapse
|
36
|
Improvement of Heterogeneous Transfer Learning Efficiency by Using Hebbian Learning Principle. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10165631] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Transfer learning algorithms have been widely studied for machine learning in recent times. In particular, in image recognition and classification tasks, transfer learning has shown significant benefits, and is getting plenty of attention in the research community. While performing a transfer of knowledge among source and target tasks, homogeneous dataset is not always available, and heterogeneous dataset can be chosen in certain circumstances. In this article, we propose a way of improving transfer learning efficiency, in case of a heterogeneous source and target, by using the Hebbian learning principle, called Hebbian transfer learning (HTL). In computer vision, biologically motivated approaches such as Hebbian learning represent associative learning, where simultaneous activation of brain cells positively affect the increase in synaptic connection strength between the individual cells. The discriminative nature of learning for the search of features in the task of image classification fits well to the techniques, such as the Hebbian learning rule—neurons that fire together wire together. The deep learning models, such as convolutional neural networks (CNN), are widely used for image classification. In transfer learning, for such models, the connection weights of the learned model should adapt to new target dataset with minimum effort. The discriminative learning rule, such as Hebbian learning, can improve performance of learning by quickly adapting to discriminate between different classes defined by target task. We apply the Hebbian principle as synaptic plasticity in transfer learning for classification of images using a heterogeneous source-target dataset, and compare results with the standard transfer learning case. Experimental results using CIFAR-10 (Canadian Institute for Advanced Research) and CIFAR-100 datasets with various combinations show that the proposed HTL algorithm can improve the performance of transfer learning, especially in the case of a heterogeneous source and target dataset.
Collapse
|
37
|
Patel SK, George B, Rai V. Artificial Intelligence to Decode Cancer Mechanism: Beyond Patient Stratification for Precision Oncology. Front Pharmacol 2020; 11:1177. [PMID: 32903628 PMCID: PMC7438594 DOI: 10.3389/fphar.2020.01177] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 07/20/2020] [Indexed: 12/13/2022] Open
Abstract
The multitude of multi-omics data generated cost-effectively using advanced high-throughput technologies has imposed challenging domain for research in Artificial Intelligence (AI). Data curation poses a significant challenge as different parameters, instruments, and sample preparations approaches are employed for generating these big data sets. AI could reduce the fuzziness and randomness in data handling and build a platform for the data ecosystem, and thus serve as the primary choice for data mining and big data analysis to make informed decisions. However, AI implication remains intricate for researchers/clinicians lacking specific training in computational tools and informatics. Cancer is a major cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018. Certain cancers, such as pancreatic and gastric cancers, are detected only after they have reached their advanced stages with frequent relapses. Cancer is one of the most complex diseases affecting a range of organs with diverse disease progression mechanisms and the effectors ranging from gene-epigenetics to a wide array of metabolites. Hence a comprehensive study, including genomics, epi-genomics, transcriptomics, proteomics, and metabolomics, along with the medical/mass-spectrometry imaging, patient clinical history, treatments provided, genetics, and disease endemicity, is essential. Cancer Moonshot℠ Research Initiatives by NIH National Cancer Institute aims to collect as much information as possible from different regions of the world and make a cancer data repository. AI could play an immense role in (a) analysis of complex and heterogeneous data sets (multi-omics and/or inter-omics), (b) data integration to provide a holistic disease molecular mechanism, (c) identification of diagnostic and prognostic markers, and (d) monitor patient's response to drugs/treatments and recovery. AI enables precision disease management well beyond the prevalent disease stratification patterns, such as differential expression and supervised classification. This review highlights critical advances and challenges in omics data analysis, dealing with data variability from lab-to-lab, and data integration. We also describe methods used in data mining and AI methods to obtain robust results for precision medicine from "big" data. In the future, AI could be expanded to achieve ground-breaking progress in disease management.
Collapse
Affiliation(s)
- Sandip Kumar Patel
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
- Buck Institute for Research on Aging, Novato, CA, United States
| | - Bhawana George
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Vineeta Rai
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
38
|
Classification of Microarray Gene Expression Data Using an Infiltration Tactics Optimization (ITO) Algorithm. Genes (Basel) 2020; 11:genes11070819. [PMID: 32708429 PMCID: PMC7397166 DOI: 10.3390/genes11070819] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 11/17/2022] Open
Abstract
A number of different feature selection and classification techniques have been proposed in literature including parameter-free and parameter-based algorithms. The former are quick but may result in local maxima while the latter use dataset-specific parameter-tuning for higher accuracy. However, higher accuracy may not necessarily mean higher reliability of the model. Thus, generalized optimization is still a challenge open for further research. This paper presents a warzone inspired "infiltration tactics" based optimization algorithm (ITO)-not to be confused with the ITO algorithm based on the Itõ Process in the field of Stochastic calculus. The proposed ITO algorithm combines parameter-free and parameter-based classifiers to produce a high-accuracy-high-reliability (HAHR) binary classifier. The algorithm produces results in two phases: (i) Lightweight Infantry Group (LIG) converges quickly to find non-local maxima and produces comparable results (i.e., 70 to 88% accuracy) (ii) Followup Team (FT) uses advanced tuning to enhance the baseline performance (i.e., 75 to 99%). Every soldier of the ITO army is a base model with its own independently chosen Subset selection method, pre-processing, and validation methods and classifier. The successful soldiers are combined through heterogeneous ensembles for optimal results. The proposed approach addresses a data scarcity problem, is flexible to the choice of heterogeneous base classifiers, and is able to produce HAHR models comparable to the established MAQC-II results.
Collapse
|
39
|
Menaga D, Revathi S. AN EMPIRICAL STUDY OF CANCER CLASSIFICATION TECHNIQUES BASED ON THE NEURAL NETWORKS. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2020. [DOI: 10.4015/s1016237220500131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Cancer is one of the most common dreadful diseases prevailing worldwide, and patients with cancer are rescued only when the cancer is detected at a very early stage. Early detection of cancer is appropriate as in the fourth stage, but the chance of survival is limited. The symptoms of cancers are rigorous, and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying the tumor, i.e. malignant or benign tumor. Over the past few years, cancer classification is increased rapidly, but there is no general technique to find novel cancer classes (class discovery) or to assign tumors to known classes. Accordingly, this survey analyzes distinct cancer classification techniques. Thus, this review article provides a detailed review of 50 research papers presenting the suggested cancer classification techniques, like Deep learning-based techniques, Neural network-based techniques, and Hybrid techniques. Moreover, an elaborative analysis and discussion are made based on the year of publication, utilized datasets, accuracy range, evaluation metrics, implementation tool, and adopted classification methods. Eventually, the research gaps and issues of various cancer classification schemes are presented for extending the researchers towards a better future scope.
Collapse
Affiliation(s)
- D. Menaga
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| | - S. Revathi
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| |
Collapse
|
40
|
López-García G, Jerez JM, Franco L, Veredas FJ. Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 2020; 15:e0230536. [PMID: 32214348 PMCID: PMC7098575 DOI: 10.1371/journal.pone.0230536] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/02/2020] [Indexed: 12/17/2022] Open
Abstract
Precision medicine in oncology aims at obtaining data from heterogeneous sources to have a precise estimation of a given patient’s state and prognosis. With the purpose of advancing to personalized medicine framework, accurate diagnoses allow prescription of more effective treatments adapted to the specificities of each individual case. In the last years, next-generation sequencing has impelled cancer research by providing physicians with an overwhelming amount of gene-expression data from RNA-seq high-throughput platforms. In this scenario, data mining and machine learning techniques have widely contribute to gene-expression data analysis by supplying computational models to supporting decision-making on real-world data. Nevertheless, existing public gene-expression databases are characterized by the unfavorable imbalance between the huge number of genes (in the order of tenths of thousands) and the small number of samples (in the order of a few hundreds) available. Despite diverse feature selection and extraction strategies have been traditionally applied to surpass derived over-fitting issues, the efficacy of standard machine learning pipelines is far from being satisfactory for the prediction of relevant clinical outcomes like follow-up end-points or patient’s survival. Using the public Pan-Cancer dataset, in this study we pre-train convolutional neural network architectures for survival prediction on a subset composed of thousands of gene-expression samples from thirty-one tumor types. The resulting architectures are subsequently fine-tuned to predict lung cancer progression-free interval. The application of convolutional networks to gene-expression data has many limitations, derived from the unstructured nature of these data. In this work we propose a methodology to rearrange RNA-seq data by transforming RNA-seq samples into gene-expression images, from which convolutional networks can extract high-level features. As an additional objective, we investigate whether leveraging the information extracted from other tumor-type samples contributes to the extraction of high-level features that improve lung cancer progression prediction, compared to other machine learning approaches.
Collapse
Affiliation(s)
- Guillermo López-García
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
- * E-mail:
| | - José M. Jerez
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| | - Leonardo Franco
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| | - Francisco J. Veredas
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| |
Collapse
|
41
|
Capobianco E, Dominietto M. From Medical Imaging to Radiomics: Role of Data Science for Advancing Precision Health. J Pers Med 2020; 10:jpm10010015. [PMID: 32121633 PMCID: PMC7151556 DOI: 10.3390/jpm10010015] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 02/17/2020] [Indexed: 12/17/2022] Open
Abstract
Treating disease according to precision health requires the individualization of therapeutic solutions as a cardinal step that is part of a process that typically depends on multiple factors. The starting point is the collection and assembly of data over time to assess the patient’s health status and monitor response to therapy. Radiomics is a very important component of this process. Its main goal is implementing a protocol to quantify the image informative contents by first mining and then extracting the most representative features. Further analysis aims to detect potential disease phenotypes through signs and marks of heterogeneity. As multimodal images hinge on various data sources, and these can be integrated with treatment plans and follow-up information, radiomics is naturally centered on dynamically monitoring disease progression and/or the health trajectory of patients. However, radiomics creates critical needs too. A concise list includes: (a) successful harmonization of intra/inter-modality radiomic measurements to facilitate the association with other data domains (genetic, clinical, lifestyle aspects, etc.); (b) ability of data science to revise model strategies and analytics tools to tackle multiple data types and structures (electronic medical records, personal histories, hospitalization data, genomic from various specimens, imaging, etc.) and to offer data-agnostic solutions for patient outcomes prediction; (c) and model validation with independent datasets to ensure generalization of results, clinical value of new risk stratifications, and support to clinical decisions for highly individualized patient management.
Collapse
Affiliation(s)
- Enrico Capobianco
- Center for Computational Science, University of Miami, FL 33146, USA
- Correspondence:
| | | |
Collapse
|
42
|
Ali H, Sharif M, Yasmin M, Rehmani MH, Riaz F. A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract. Artif Intell Rev 2019. [DOI: 10.1007/s10462-019-09743-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
43
|
Azuaje F. Artificial intelligence for precision oncology: beyond patient stratification. NPJ Precis Oncol 2019; 3:6. [PMID: 30820462 PMCID: PMC6389974 DOI: 10.1038/s41698-019-0078-1] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 01/22/2019] [Indexed: 12/18/2022] Open
Abstract
The data-driven identification of disease states and treatment options is a crucial challenge for precision oncology. Artificial intelligence (AI) offers unique opportunities for enhancing such predictive capabilities in the lab and the clinic. AI, including its best-known branch of research, machine learning, has significant potential to enable precision oncology well beyond relatively well-known pattern recognition applications, such as the supervised classification of single-source omics or imaging datasets. This perspective highlights key advances and challenges in that direction. Furthermore, it argues that AI's scope and depth of research need to be expanded to achieve ground-breaking progress in precision oncology.
Collapse
Affiliation(s)
- Francisco Azuaje
- Bioinformatics and Modelling Research Group, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
- Present Address: Computational Biomedicine Research Group, Center for Quantitative Biology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| |
Collapse
|