1
|
Madhan S, Kalaiselvan A. Omics data classification using constitutive artificial neural network optimized with single candidate optimizer. NETWORK (BRISTOL, ENGLAND) 2025; 36:343-367. [PMID: 38736309 DOI: 10.1080/0954898x.2024.2348726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/14/2024]
Abstract
Recent technical advancements enable omics-based biological study of molecules with very high throughput and low cost, such as genomic, proteomic, and microbionics'. To overcome this drawback, Omics Data Classification using Constitutive Artificial Neural Network Optimized with Single Candidate Optimizer (ODC-ZOA-CANN-SCO) is proposed in this manuscript. The input data is pre-processing by using Adaptive variational Bayesian filtering (AVBF) to replace missing values. The pre-processing data is fed to Zebra Optimization Algorithm (ZOA) for dimensionality reduction. Then, the Constitutive Artificial Neural Network (CANN) is employed to classify omics data. The weight parameter is optimized by Single Candidate Optimizer (SCO). The proposed ODC-ZOA-CANN-SCO method attains 25.36%, 21.04%, 22.18%, 26.90%, and 28.12% higher accuracy when analysed to the existing methods like multi-omics data integration utilizing adaptive graph learning and attention mode for patient categorization with biomarker identification (MOD-AGL-AM-PABI), deep learning method depending upon multi-omics data integration to create risk stratification prediction mode for skin cutaneous melanoma (DL-MODI-RSP-SCM), Deep belief network-base model for identifying Alzheimer's disease utilizing multi-omics data (DDN-DAD-MOD), hybrid cancer prediction depending upon multi-omics data and reinforcement learning state action reward state action (HCP-MOD-RL-SARSA), machine learning basis method under omics data including biological knowledge database for cancer clinical endpoint prediction (ML-ODBKD-CCEP) methods, respectively.
Collapse
Affiliation(s)
- Subramaniam Madhan
- Department of Computer Science and Engineering, University College of Engineering, Thirukkuvalai (A Constituent College of Anna University Chennai), Nagapattinam, Tamilnadu, India
| | - Anbarasan Kalaiselvan
- Department of Science and Humanities, University College of Engineering, Thirukkuvalai (A Constituent College of Anna University Chennai), Nagapattinam, Tamilnadu, India
| |
Collapse
|
2
|
Bedi P, Rani S, Gupta B, Bhasin V, Gole P. EpiBrCan-Lite: A lightweight deep learning model for breast cancer subtype classification using epigenomic data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108553. [PMID: 39667144 DOI: 10.1016/j.cmpb.2024.108553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 11/14/2024] [Accepted: 12/03/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND AND OBJECTIVES Early breast cancer subtypes classification improves the survival rate as it facilitates prognosis of the patient. In literature this problem was prominently solved by various Machine Learning and Deep Learning techniques. However, these studies have three major shortcomings: huge Trainable Weight Parameters (TWP), suffer from low performance and class imbalance problem. METHODS This paper proposes a lightweight model named EpiBrCan-Lite for classifying breast cancer subtypes using DNA methylation data. This model encompasses three blocks namely Data Encoding, TransGRU, and Classification blocks. In Data Encoding block, the input features are encoded into equal sized chunks and then passed down to TransGRU block which is a modified version of traditional Transformer Encoder (TE). In TransGRU block, MLP module of traditional TE is replaced by GRU module, consisting of two GRU layers to reduce TWP and capture the long-range dependencies of input feature data. Furthermore, output of TransGRU block is passed to Classification block for classifying breast cancer into their subtypes. RESULTS The proposed model is validated using Accuracy, Precision, Recall, F1-score, FPR, and FNR metrics on TCGA breast cancer dataset. This dataset suffers from the class imbalance problem which is mitigated using Synthetic Minority Oversampling Technique (SMOTE). Experimentation results demonstrate that EpiBrCan-Lite model attained 95.85 % accuracy, 95.96 % recall, 95.85 % precision, 95.90 % F1-score, 1.03 % FPR, and 4.12 % FNR despite of utilizing only 1/1500 of TWP than other state-of-the-art models. CONCLUSION EpiBrCan-Lite model is efficiently classifying breast cancer subtypes, and being lightweight, it is suitable to be deployed on low computational powered devices.
Collapse
Affiliation(s)
- Punam Bedi
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Surbhi Rani
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Bhavna Gupta
- Keshav Mahavidyalaya, University of Delhi, New Delhi, India.
| | - Veenu Bhasin
- PGDAV College, University of Delhi, New Delhi, India.
| | - Pushkar Gole
- Department of Computer Science, University of Delhi, Delhi, India.
| |
Collapse
|
3
|
Emmanuel J, Isewon I, Oyelade J. An optimized deep-forest algorithm using a modified differential evolution optimization algorithm: A case of host-pathogen protein-protein interaction prediction. Comput Struct Biotechnol J 2025; 27:595-611. [PMID: 39995682 PMCID: PMC11849198 DOI: 10.1016/j.csbj.2025.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 01/21/2025] [Accepted: 01/21/2025] [Indexed: 02/26/2025] Open
Abstract
Deep Forest employs forest structures and leverages deep architecture to learn feature vector information adaptively. However, deep forest-based models have limitations such as manual hyperparameter optimization and time and memory usage inefficiencies. Bayesian optimization is a widely used model-based hyperparameter optimization method. Evolutionary algorithms such as Differential Evolution (DE) have recently been introduced to improve Bayesian optimization's acquisition function. Despite its effectiveness, DE has a significant drawback as it relies on randomly selecting indices from the population of target vectors to construct donor vectors in search of optimal solutions. This randomness is ineffective, as suboptimal or redundant indices may be selected. Therefore, in this research we developed a modified differential evolution (DE) acquisition function for improved host-pathogen protein-protein interaction prediction. The modified DE introduces a weighted and adaptive donor vector technique that selects the best-fitted donor vectors as opposed to the random approach. This modified optimization approach was implemented in a deep forest model for automatic hyperparameter optimization. The performance of the optimized deep forest model was evaluated on human-Plasmodium falciparum protein sequence datasets using 10-fold cross-validation. The results were compared with standard optimization methods such as traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and other machine learning models. The optimized model achieved an accuracy of 89.3 %, outperforming other models across all metrics, including a sensitivity of 85.4 % and a precision of 91.6 %. Additionally, the optimized model predicted seven novel host-pathogen interactions. Finally, the model was implemented as a web application which is accessible at http://dfh3pi.covenantuniversity.edu.ng.
Collapse
Affiliation(s)
- Jerry Emmanuel
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| |
Collapse
|
4
|
Jiang L, Jia L, Wang Y, Wu Y, Yue J. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. Interdiscip Sci 2024; 16:1019-1037. [PMID: 38758306 DOI: 10.1007/s12539-024-00635-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.
Collapse
Affiliation(s)
- Liancheng Jiang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Liye Jia
- College of Computer Science and Technology, Taiyuan Normal University, Taiyuan, 030619, China
| | - Yizhen Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Yongfei Wu
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Junhong Yue
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China.
| |
Collapse
|
5
|
Jia L, Jiang L, Yue J, Hao F, Wu Y, Liu X. MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2568-2579. [PMID: 39453793 DOI: 10.1109/tcbb.2024.3486742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2024]
Abstract
The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.
Collapse
|
6
|
Shen J, Guo X, Bai H, Luo J. CAEM-GBDT: a cancer subtype identifying method using multi-omics data and convolutional autoencoder network. FRONTIERS IN BIOINFORMATICS 2024; 4:1403826. [PMID: 39077754 PMCID: PMC11284046 DOI: 10.3389/fbinf.2024.1403826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/13/2024] [Indexed: 07/31/2024] Open
Abstract
The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at https://github.com/gxh-1/CAEM-GBDT.git.
Collapse
Affiliation(s)
| | | | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
7
|
Jaganathan D, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. Revolutionizing Breast Cancer Diagnosis: A Concatenated Precision through Transfer Learning in Histopathological Data Analysis. Diagnostics (Basel) 2024; 14:422. [PMID: 38396461 PMCID: PMC10887508 DOI: 10.3390/diagnostics14040422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/03/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
Breast cancer remains a significant global public health concern, emphasizing the critical role of accurate histopathological analysis in diagnosis and treatment planning. In recent years, the advent of deep learning techniques has showcased notable potential in elevating the precision and efficiency of histopathological data analysis. The proposed work introduces a novel approach that harnesses the power of Transfer Learning to capitalize on knowledge gleaned from pre-trained models, adapting it to the nuanced landscape of breast cancer histopathology. Our proposed model, a Transfer Learning-based concatenated model, exhibits substantial performance enhancements compared to traditional methodologies. Leveraging well-established pretrained models such as VGG-16, MobileNetV2, ResNet50, and DenseNet121-each Convolutional Neural Network architecture designed for classification tasks-this study meticulously tunes hyperparameters to optimize model performance. The implementation of a concatenated classification model is systematically benchmarked against individual classifiers on histopathological data. Remarkably, our concatenated model achieves an impressive training accuracy of 98%. The outcomes of our experiments underscore the efficacy of this four-level concatenated model in advancing the accuracy of breast cancer histopathological data analysis. By synergizing the strengths of deep learning and transfer learning, our approach holds the potential to augment the diagnostic capabilities of pathologists, thereby contributing to more informed and personalized treatment planning for individuals diagnosed with breast cancer. This research heralds a promising stride toward leveraging cutting-edge technology to refine the understanding and management of breast cancer, marking a significant advancement in the intersection of artificial intelligence and healthcare.
Collapse
Affiliation(s)
- Dhayanithi Jaganathan
- Department of Computer Science and Engineering, Sona College of Technology, Salem 636005, India;
| | | | - Vidhushavarshini Sureshkumar
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus, Chennai 600026, India;
| | | |
Collapse
|
8
|
Yao L, Guan J, Li W, Chung CR, Deng J, Chiang YC, Lee TY. Identifying Antitubercular Peptides via Deep Forest Architecture with Effective Feature Representation. Anal Chem 2024; 96:1538-1546. [PMID: 38226973 DOI: 10.1021/acs.analchem.3c04196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Tuberculosis (TB) is a severe disease caused by Mycobacterium tuberculosis that poses a significant threat to human health. The emergence of drug-resistant strains has made the global fight against TB even more challenging. Antituberculosis peptides (ATPs) have shown promising results as a potential treatment for TB. However, conventional wet lab-based approaches to ATP discovery are time-consuming and costly and often fail to discover peptides with desired properties. To address these challenges, we propose a novel machine learning-based framework called ATPfinder that can significantly accelerate the discovery of ATP. Our approach integrates various efficient peptide descriptors and utilizes the deep forest algorithm to construct the model. This neural network-like cascading structure can effectively process and mine features without complex hyperparameter tuning. Our experimental results show that ATPfinder outperforms existing ATP prediction tools, achieving state-of-the-art performance with an accuracy of 89.3% and an MCC of 0.70. Moreover, our framework exhibits better robustness than baseline algorithms commonly used for other sequence analysis tasks. Additionally, the excellent interpretability of our model can assist researchers in understanding the critical features of ATP. Finally, we developed a downloadable desktop application to simplify the use of our framework for researchers. Therefore, ATPfinder can facilitate the discovery of peptide drugs and provide potential solutions for TB treatment. Our framework is freely available at https://github.com/lantianyao/ATPfinder/ (data sets and code) and https://awi.cuhk.edu.cn/dbAMP/ATPfinder.html (software).
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
9
|
Zhong Y, Peng Y, Lin Y, Chen D, Zhang H, Zheng W, Chen Y, Wu C. MODILM: towards better complex diseases classification using a novel multi-omics data integration learning model. BMC Med Inform Decis Mak 2023; 23:82. [PMID: 37147619 PMCID: PMC10161645 DOI: 10.1186/s12911-023-02173-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/11/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Accurately classifying complex diseases is crucial for diagnosis and personalized treatment. Integrating multi-omics data has been demonstrated to enhance the accuracy of analyzing and classifying complex diseases. This can be attributed to the highly correlated nature of the data with various diseases, as well as the comprehensive and complementary information it provides. However, integrating multi-omics data for complex diseases is challenged by data characteristics such as high imbalance, scale variation, heterogeneity, and noise interference. These challenges further emphasize the importance of developing effective methods for multi-omics data integration. RESULTS We proposed a novel multi-omics data learning model called MODILM, which integrates multiple omics data to improve the classification accuracy of complex diseases by obtaining more significant and complementary information from different single-omics data. Our approach includes four key steps: 1) constructing a similarity network for each omics data using the cosine similarity measure, 2) leveraging Graph Attention Networks to learn sample-specific and intra-association features from similarity networks for single-omics data, 3) using Multilayer Perceptron networks to map learned features to a new feature space, thereby strengthening and extracting high-level omics-specific features, and 4) fusing these high-level features using a View Correlation Discovery Network to learn cross-omics features in the label space, which results in unique class-level distinctiveness for complex diseases. To demonstrate the effectiveness of MODILM, we conducted experiments on six benchmark datasets consisting of miRNA expression, mRNA, and DNA methylation data. Our results show that MODILM outperforms state-of-the-art methods, effectively improving the accuracy of complex disease classification. CONCLUSIONS Our MODILM provides a more competitive way to extract and integrate important and complementary information from multiple omics data, providing a very promising tool for supporting decision-making for clinical diagnosis.
Collapse
Affiliation(s)
- Yating Zhong
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, 530001, China
| | - Yuzhong Peng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, 530001, China.
| | - Yanmei Lin
- School of Environment and Life Science, Nanning Normal University, Nanning, 530001, China.
| | - Dingjia Chen
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, 530001, China
| | - Hao Zhang
- School of Computer Science, Fudan University, Shanghai, 200433, China
- School of Computer, Guangdong University of Petrochemical Technology, Maoming, 525000, China
| | - Wen Zheng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, 530001, China
| | - Yuanyuan Chen
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, 530001, China
| | - Changliang Wu
- Department of Spleen, Stomach and Liver Diseases, Guangxi International Zhuang Medical Hospital, Nanning, 530201, China
| |
Collapse
|
10
|
Li H, Wang D, Zhou X, Ding S, Guo W, Zhang S, Li Z, Huang T, Cai YD. Characterization of spleen and lymph node cell types via CITE-seq and machine learning methods. Front Mol Neurosci 2022; 15:1033159. [PMID: 36311013 PMCID: PMC9608858 DOI: 10.3389/fnmol.2022.1033159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
The spleen and lymph nodes are important functional organs for human immune system. The identification of cell types for spleen and lymph nodes is helpful for understanding the mechanism of immune system. However, the cell types of spleen and lymph are highly diverse in the human body. Therefore, in this study, we employed a series of machine learning algorithms to computationally analyze the cell types of spleen and lymph based on single-cell CITE-seq sequencing data. A total of 28,211 cell data (training vs. test = 14,435 vs. 13,776) involving 24 cell types were collected for this study. For the training dataset, it was analyzed by Boruta and minimum redundancy maximum relevance (mRMR) one by one, resulting in an mRMR feature list. This list was fed into the incremental feature selection (IFS) method, incorporating four classification algorithms (deep forest, random forest, K-nearest neighbor, and decision tree). Some essential features were discovered and the deep forest with its optimal features achieved the best performance. A group of related proteins (CD4, TCRb, CD103, CD43, and CD23) and genes (Nkg7 and Thy1) contributing to the classification of spleen and lymph nodes cell types were analyzed. Furthermore, the classification rules yielded by decision tree were also provided and analyzed. Above findings may provide helpful information for deepening our understanding on the diversity of cell types.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Department of Radiology, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
11
|
DCE-DForest: A Deep Forest Model for the Prediction of Anticancer Drug Combination Effects. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:8693746. [PMID: 35720022 PMCID: PMC9203182 DOI: 10.1155/2022/8693746] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/13/2022] [Accepted: 05/23/2022] [Indexed: 11/18/2022]
Abstract
Drug combinations have recently been studied intensively due to their critical role in cancer treatment. Computational prediction of drug synergy has become a popular alternative strategy to experimental methods for anticancer drug synergy predictions. In this paper, a deep learning model called DCE-DForest is proposed to predict the synergistic effect of drug combinations. To sufficiently extract drug information, the paper leverages BERT (Bidirectional Encoder Representations from Transformers) to encode the drug and the deep forest to model the nonlinear relationship between the drugs and cell lines. The experimental results on the synergy datasets demonstrate that the proposed method consistently shows superior performance over the other machine learning models.
Collapse
|