1
|
Ajani SN, Mulla RA, Limkar S, Ashtagi R, Wagh SK, Pawar ME. RETRACTED ARTICLE: DLMBHCO: design of an augmented bioinspired deep learning-based multidomain body parameter analysis via heterogeneous correlative body organ analysis. Soft comput 2024; 28:635. [PMID: 37362266 PMCID: PMC10248994 DOI: 10.1007/s00500-023-08613-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2023] [Indexed: 06/28/2023]
Affiliation(s)
- Samir N. Ajani
- Department of Computer Science
& Engineering (Data Science), St.
Vincent Pallotti College of Engineering and Technology,
Nagpur, Maharashtra India
| | - Rais Allauddin Mulla
- Department of Computer Engineering, Vasantdada Patil
Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra India
| | - Suresh Limkar
- Department of Artificial Intelligence
and Data Science, AISSMS Institute of
Information Technology, Pune,
Maharashtra India
| | - Rashmi Ashtagi
- Department of Computer Engineering,
Vishwakarma Institute of Technology,
Bibwewadi, Pune, 411037 Maharashtra
India
| | - Sharmila K. Wagh
- Department of Computer Engineering,
Modern Education Society’s College of
Engineering, Pune, Maharashtra India
| | - Mahendra Eknath Pawar
- Department of Computer Engineering, Vasantdada Patil
Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra India
| |
Collapse
|
2
|
Byeon H, Shabaz M, Ramesh JVN, Dutta AK, Vijay R, Soni M, Patni JC, Rusho MA, Singh PP. Feature fusion-based food protein subcellular prediction for drug composition. Food Chem 2024; 454:139747. [PMID: 38797095 DOI: 10.1016/j.foodchem.2024.139747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/05/2024] [Accepted: 05/18/2024] [Indexed: 05/29/2024]
Abstract
The structure and function of dietary proteins, as well as their subcellular prediction, are critical for designing and developing new drug compositions and understanding the pathophysiology of certain diseases. As a remedy, we provide a subcellular localization method based on feature fusion and clustering for dietary proteins. Additionally, an enhanced PseAAC (Pseudo-amino acid composition) method is suggested, which builds upon the conventional PseAAC. The study initially builds a novel model of representing the food protein sequence by integrating autocorrelation, chi density, and improved PseAAC to better convey information about the food protein sequence. After that, the dimensionality of the fused feature vectors is reduced by using principal component analysis. With prediction accuracies of 99.24% in the Gram-positive dataset and 95.33% in the Gram-negative dataset, respectively, the experimental findings demonstrate the practicability and efficacy of the proposed approach. This paper is basically exploring pseudo-amino acid composition of not any clinical aspect but exploring a pharmaceutical aspect for drug repositioning.
Collapse
Affiliation(s)
- Haewon Byeon
- Department of AI and Software, Inje University, Gimhae 50834, Republic of Korea; Inje University Medical Big Data Research Center, Gimhae 50834, Republic of Korea
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, J&K, India.
| | - Janjhyam Venkata Naga Ramesh
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur Dist., Andhra Pradesh 522302, India
| | - Ashit Kumar Dutta
- Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh 13713, Saudi Arabia.
| | - Richa Vijay
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India
| | - Mukesh Soni
- Dr. D. Y. Patil Vidyapeeth, Pune, Dr. D. Y. Patil School of Science & Technology, Tathawade, Pune, India
| | | | - Maher Ali Rusho
- Department of Lockheed Martin Engineering Management, University of Colorado, Boulder, Boulder, CO 80309, USA.
| | | |
Collapse
|
3
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
4
|
Zou K, Wang S, Wang Z, Zou H, Yang F. Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence. SENSORS (BASEL, SWITZERLAND) 2023; 23:9014. [PMID: 38005402 PMCID: PMC10675401 DOI: 10.3390/s23229014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/29/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023]
Abstract
Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular location of proteins based on protein sequence, immunohistochemistry (IHC) images, or immunofluorescence (IF) images. However, the research on the fusion of multiple protein signals has received little attention. In this study, we developed a dual-signal computational protocol by incorporating IHC images into protein sequences to learn protein subcellular localization. Three major steps can be summarized as follows in this protocol: first, a benchmark database that includes 281 proteins sorted out from 4722 proteins of the Human Protein Atlas (HPA) and Swiss-Prot database, which is involved in the endoplasmic reticulum (ER), Golgi apparatus, cytosol, and nucleoplasm; second, discriminative feature operators were first employed to quantitate protein image-sequence samples that include IHC images and protein sequence; finally, the feature subspace of different protein signals is absorbed to construct multiple sub-classifiers via dimensionality reduction and binary relevance (BR), and multiple confidence derived from multiple sub-classifiers is adopted to decide subcellular location by the centralized voting mechanism at the decision layer. The experimental results indicated that the dual-signal model embedded IHC images and protein sequences outperformed the single-signal models with accuracy, precision, and recall of 75.41%, 80.38%, and 74.38%, respectively. It is enlightening for further research on protein subcellular location prediction under multi-signal fusion of protein.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| |
Collapse
|
5
|
Zou K, Wang S, Wang Z, Zhang Z, Yang F. HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units. Front Mol Biosci 2023; 10:1171429. [PMID: 37664182 PMCID: PMC10470064 DOI: 10.3389/fmolb.2023.1171429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities. Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors. Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
6
|
Ullah M, Hadi F, Song J, Yu DJ. PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method. Bioinformatics 2023; 39:6839969. [PMID: 36413068 PMCID: PMC9947927 DOI: 10.1093/bioinformatics/btac727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/02/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022] Open
Abstract
MOTIVATION Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as 'intermediate decision' sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the 'intermediate feature' set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matee Ullah
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Fazal Hadi
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | | | - Dong-Jun Yu
- To whom correspondence should be addressed. or
| |
Collapse
|
7
|
Ullah M, Hadi F, Song J, Yu DJ. PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data. Bioinformatics 2022; 38:4019-4026. [PMID: 35771606 PMCID: PMC9890309 DOI: 10.1093/bioinformatics/btac432] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/03/2022] [Accepted: 06/28/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design. RESULTS Here, we develop a novel bioimage-based computational approach, termed PScL-DDCFPred, to accurately predict protein subcellular localizations in human tissues. PScL-DDCFPred first extracts multiview image features, including global and local features, as base or pure features; next, it applies a new integrative feature selection method based on stepwise discriminant analysis and generalized discriminant analysis to identify the optimal feature sets from the extracted pure features; Finally, a classifier based on deep neural network (DNN) and deep-cascade forest (DCF) is established. Stringent 10-fold cross-validation tests on the new protein subcellular localization training dataset, constructed from the human protein atlas databank, illustrates that PScL-DDCFPred achieves a better performance than several existing state-of-the-art methods. Moreover, the independent test set further illustrates the generalization capability and superiority of PScL-DDCFPred over existing predictors. In-depth analysis shows that the excellent performance of PScL-DDCFPred can be attributed to three critical factors, namely the effective combination of the DNN and DCF models, complementarity of global and local features, and use of the optimal feature sets selected by the integrative feature selection algorithm. AVAILABILITY AND IMPLEMENTATION https://github.com/csbio-njust-edu/PScL-DDCFPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matee Ullah
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Fazal Hadi
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
8
|
Wu L, Gao S, Yao S, Wu F, Li J, Dong Y, Zhang Y. Gm-PLoc: A Subcellular Localization Model of Multi-Label Protein Based on GAN and DeepFM. Front Genet 2022; 13:912614. [PMID: 35783287 PMCID: PMC9240597 DOI: 10.3389/fgene.2022.912614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
Identifying the subcellular localization of a given protein is an essential part of biological and medical research, since the protein must be localized in the correct organelle to ensure physiological function. Conventional biological experiments for protein subcellular localization have some limitations, such as high cost and low efficiency, thus massive computational methods are proposed to solve these problems. However, some of these methods need to be improved further for protein subcellular localization with class imbalance problem. We propose a new model, generating minority samples for protein subcellular localization (Gm-PLoc), to predict the subcellular localization of multi-label proteins. This model includes three steps: using the position specific scoring matrix to extract distinguishable features of proteins; synthesizing samples of the minority category to balance the distribution of categories based on the revised generative adversarial networks; training a classifier with the rebalanced dataset to predict the subcellular localization of multi-label proteins. One benchmark dataset is selected to evaluate the performance of the presented model, and the experimental results demonstrate that Gm-PLoc performs well for the multi-label protein subcellular localization.
Collapse
Affiliation(s)
- Liwen Wu
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Song Gao
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Shaowen Yao
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Feng Wu
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Jie Li
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Yunyun Dong
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
| | - Yunqi Zhang
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
- School of Software, Yunnan University, Kunming, China
- Yunnan Key Laboratory of Statistical Modeling and Data Analysis, School of Mathematics and Statistics, Yunnan University, Kunming, China
- *Correspondence: Yunqi Zhang,
| |
Collapse
|
9
|
Wang F, Wei L. Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on immunohistochemistry images. Bioinformatics 2022; 38:2602-2611. [PMID: 35212728 DOI: 10.1093/bioinformatics/btac123] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 02/09/2022] [Accepted: 02/24/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The development of microscopic imaging techniques enables us to study protein subcellular locations from the tissue level down to the cell level, contributing to the rapid development of image-based protein subcellular location prediction approaches. However, existing methods suffer from intrinsic limitations, such as poor feature representation ability, data imbalanced issue, and multi-label classification problem, greatly impacting the model performance and generalization. RESULTS In this study, we propose MSTLoc, a novel multi-scale end-to-end deep learning model to identify protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. In our MSTLoc, we deploy a deep convolution neural network to extract multi-scale features from the IHC images, aggregate the high-level features and low-level features via feature fusion to sufficiently exploit the dependencies amongst various subcellular locations, and utilize Vision Transformer (ViT) to model the relationship amongst the features and enhance the feature representation ability. We demonstrate that the proposed MSTLoc achieves better performance than current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we demonstrate that as compared with the hand-crafted features, the multi-scale deep features learnt from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Finally, case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved with cancer development. For the convenient use of our method, we establish a user-friendly webserver available at http://server.wei-group.net/ MSTLoc. AVAILABILITY AND IMPLEMENTATION http://server.wei-group.net/ MSTLoc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fengsheng Wang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
10
|
Xu M, Chen Y, Xu Z, Zhang L, Jiang H, Pian C. MiRLoc: predicting miRNA subcellular localization by incorporating miRNA-mRNA interactions and mRNA subcellular localization. Brief Bioinform 2022; 23:6532537. [PMID: 35183063 DOI: 10.1093/bib/bbac044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 01/17/2022] [Accepted: 01/28/2022] [Indexed: 12/19/2022] Open
Abstract
Subcellular localization of microRNAs (miRNAs) is an important reflection of their biological functions. Considering the spatio-temporal specificity of miRNA subcellular localization, experimental detection techniques are expensive and time-consuming, which strongly motivates an efficient and economical computational method to predict miRNA subcellular localization. In this paper, we describe a computational framework, MiRLoc, to predict the subcellular localization of miRNAs. In contrast to existing methods, MiRLoc uses the functional similarity between miRNAs instead of sequence features and incorporates information about the subcellular localization of the corresponding target mRNAs. The results show that miRNA functional similarity data can be effectively used to predict miRNA subcellular localization, and that inclusion of subcellular localization information of target mRNAs greatly improves prediction performance.
Collapse
Affiliation(s)
- Mingmin Xu
- College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Yuanyuan Chen
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Zhihui Xu
- Simcere Diagnostics Co., Ltd., Nanjing, Jiangsu, China
| | - Liangyun Zhang
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, Zhejiang, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China.,Simcere Diagnostics Co., Ltd., Nanjing, Jiangsu, China
| |
Collapse
|
11
|
Ullah M, Han K, Hadi F, Xu J, Song J, Yu DJ. PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection. Brief Bioinform 2021; 22:bbab278. [PMID: 34337652 PMCID: PMC8574991 DOI: 10.1093/bib/bbab278] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 06/30/2021] [Accepted: 07/01/2021] [Indexed: 01/17/2023] Open
Abstract
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.
Collapse
Affiliation(s)
- Matee Ullah
- Nanjing University of Science and Technology, China
| | - Ke Han
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| | - Fazal Hadi
- Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan
| | - Jian Xu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| |
Collapse
|
12
|
Bichindaritz I, Liu G, Bartlett C. Integrative Survival Analysis of Breast Cancer with Gene Expression and DNA Methylation Data. Bioinformatics 2021; 37:2601-2608. [PMID: 33681976 PMCID: PMC8428600 DOI: 10.1093/bioinformatics/btab140] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/30/2021] [Accepted: 03/01/2021] [Indexed: 02/03/2023] Open
Abstract
Motivation Integrative multi-feature fusion analysis on biomedical data has gained much attention recently. In breast cancer, existing studies have demonstrated that combining genomic mRNA data and DNA methylation data can better stratify cancer patients with distinct prognosis than using single signature. However, those existing methods are simply combining these gene features in series and have ignored the correlations between separate omics dimensions over time. Results In the present study, we propose an adaptive multi-task learning method, which combines the Cox loss task with the ordinal loss task, for survival prediction of breast cancer patients using multi-modal learning instead of performing survival analysis on each feature dataset. First, we use local maximum quasi-clique merging (lmQCM) algorithm to reduce the mRNA and methylation feature dimensions and extract cluster eigengenes respectively. Then, we add an auxiliary ordinal loss to the original Cox model to improve the ability to optimize the learning process in training and regularization. The auxiliary loss helps to reduce the vanishing gradient problem for earlier layers and helps to decrease the loss of the primary task. Meanwhile, we use an adaptive weights approach to multi-task learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. Finally, we build an ordinal cox hazards model for survival analysis and use long short-term memory (LSTM) method to predict patients’ survival risk. We use the cross-validation method and the concordance index (C-index) for assessing the prediction effect. Stringent cross-verification testing processes for the benchmark dataset and two additional datasets demonstrate that the developed approach is effective, achieving very competitive performance with existing approaches. Availability and implementation https://github.com/bhioswego/ML_ordCOX.
Collapse
Affiliation(s)
- Isabelle Bichindaritz
- Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, New York, USA
| | - Guanghui Liu
- Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, New York, USA
| | - Christopher Bartlett
- Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, New York, USA
| |
Collapse
|
13
|
Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer. BMC Bioinformatics 2020; 21:398. [PMID: 32907537 PMCID: PMC7487883 DOI: 10.1186/s12859-020-03731-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 08/31/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. RESULTS In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. CONCLUSIONS Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.
Collapse
|
14
|
Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method. BMC Bioinformatics 2019; 20:719. [PMID: 31888447 PMCID: PMC6936157 DOI: 10.1186/s12859-019-3232-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. RESULTS In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. CONCLUSIONS >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.
Collapse
|