1
|
Zou K, Wang S, Wang Z, Zhang Z, Yang F. HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units. Front Mol Biosci 2023; 10:1171429. [PMID: 37664182 PMCID: PMC10470064 DOI: 10.3389/fmolb.2023.1171429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities. Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors. Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
2
|
Wang RH, Luo T, Zhang HL, Du PF. PLA-GNN: Computational inference of protein subcellular location alterations under drug treatments with deep graph neural networks. Comput Biol Med 2023; 157:106775. [PMID: 36921458 DOI: 10.1016/j.compbiomed.2023.106775] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/21/2023] [Accepted: 03/09/2023] [Indexed: 03/12/2023]
Abstract
The aberrant protein sorting has been observed in many conditions, including complex diseases, drug treatments, and environmental stresses. It is important to systematically identify protein mis-localization events in a given condition. Experimental methods for finding mis-localized proteins are always costly and time consuming. Predicting protein subcellular localizations has been studied for many years. However, only a handful of existing works considered protein subcellular location alterations. We proposed a computational method for identifying alterations of protein subcellular locations under drug treatments. We took three drugs, including TSA (trichostain A), bortezomib and tacrolimus, as instances for this study. By introducing dynamic protein-protein interaction networks, graph neural network algorithms were applied to aggregate topological information under different conditions. We systematically reported potential protein mis-localization events under drug treatments. As far as we know, this is the first attempt to find protein mis-localization events computationally in drug treatment conditions. Literatures validated that a number of proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. We name our method as PLA-GNN (Protein Localization Alteration by Graph Neural Networks). It can be extended to other drugs and other conditions. All datasets and codes of this study has been deposited in a GitHub repository (https://github.com/quinlanW/PLA-GNN).
Collapse
Affiliation(s)
- Ren-Hua Wang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Han-Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
3
|
Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F. Application of Machine Learning in Spatial Proteomics. J Chem Inf Model 2022; 62:5875-5895. [PMID: 36378082 DOI: 10.1021/acs.jcim.2c01161] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
4
|
Gonschior H, Schmied C, Van der Veen RE, Eichhorst J, Himmerkus N, Piontek J, Günzel D, Bleich M, Furuse M, Haucke V, Lehmann M. Nanoscale segregation of channel and barrier claudins enables paracellular ion flux. Nat Commun 2022; 13:4985. [PMID: 36008380 PMCID: PMC9411157 DOI: 10.1038/s41467-022-32533-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 08/04/2022] [Indexed: 11/09/2022] Open
Abstract
The paracellular passage of ions and small molecules across epithelia is controlled by tight junctions, complex meshworks of claudin polymers that form tight seals between neighboring cells. How the nanoscale architecture of tight junction meshworks enables paracellular passage of specific ions or small molecules without compromising barrier function is unknown. Here we combine super-resolution stimulated emission depletion microscopy in live and fixed cells and tissues, multivariate classification of super-resolution images and fluorescence resonance energy transfer to reveal the nanoscale organization of tight junctions formed by mammalian claudins. We show that only a subset of claudins can assemble into characteristic homotypic meshworks, whereas tight junctions formed by multiple claudins display nanoscale organization principles of intermixing, integration, induction, segregation, and exclusion of strand assemblies. Interestingly, channel-forming claudins are spatially segregated from barrier-forming claudins via determinants mainly encoded in their extracellular domains also known to harbor mutations leading to human diseases. Electrophysiological analysis of claudins in epithelial cells suggests that nanoscale segregation of distinct channel-forming claudins enables barrier function combined with specific paracellular ion flux across tight junctions. Meshworks of claudin polymers control the paracellular transport and barrier properties of epithelial tight junctions. Here, the authors show different claudin nanoscale organization principles, finding that claudin segregation enables barrier formation and paracellular ion flux across tight junctions.
Collapse
Affiliation(s)
- Hannes Gonschior
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | - Christopher Schmied
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | | | - Jenny Eichhorst
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | - Nina Himmerkus
- Institute of Physiology, Christian-Albrechts-Universität zu Kiel, 24118, Kiel, Germany
| | - Jörg Piontek
- Clinical Physiology/Nutritional Medicine, Medical Department, Division of Gastroenterology, Infectiology, Rheumatology, Charité - Universitätsmedizin Berlin, 12203, Berlin, Germany
| | - Dorothee Günzel
- Clinical Physiology/Nutritional Medicine, Medical Department, Division of Gastroenterology, Infectiology, Rheumatology, Charité - Universitätsmedizin Berlin, 12203, Berlin, Germany
| | - Markus Bleich
- Institute of Physiology, Christian-Albrechts-Universität zu Kiel, 24118, Kiel, Germany
| | - Mikio Furuse
- Division of Cell Structure, National Institute for Physiological Sciences, Okazaki, Aichi, 444-8787, Japan.,Department of Physiological Sciences, School of Life Science, SOKENDAI (Graduate University for Advanced Studies), Okazaki, Aichi, 444-8585, Japan
| | - Volker Haucke
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany.,Faculty of Biology, Chemistry and Pharmacy, Freie Universität Berlin, 14195, Berlin, Germany
| | - Martin Lehmann
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany.
| |
Collapse
|
5
|
Nanni L, Paci M, Brahnam S, Lumini A. Feature transforms for image data augmentation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07645-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractA problem with convolutional neural networks (CNNs) is that they require large datasets to obtain adequate robustness; on small datasets, they are prone to overfitting. Many methods have been proposed to overcome this shortcoming with CNNs. In cases where additional samples cannot easily be collected, a common approach is to generate more data points from existing data using an augmentation technique. In image classification, many augmentation approaches utilize simple image manipulation algorithms. In this work, we propose some new methods for data augmentation based on several image transformations: the Fourier transform (FT), the Radon transform (RT), and the discrete cosine transform (DCT). These and other data augmentation methods are considered in order to quantify their effectiveness in creating ensembles of neural networks. The novelty of this research is to consider different strategies for data augmentation to generate training sets from which to train several classifiers which are combined into an ensemble. Specifically, the idea is to create an ensemble based on a kind of bagging of the training set, where each model is trained on a different training set obtained by augmenting the original training set with different approaches. We build ensembles on the data level by adding images generated by combining fourteen augmentation approaches, with three based on FT, RT, and DCT, proposed here for the first time. Pretrained ResNet50 networks are finetuned on training sets that include images derived from each augmentation method. These networks and several fusions are evaluated and compared across eleven benchmarks. Results show that building ensembles on the data level by combining different data augmentation methods produce classifiers that not only compete competitively against the state-of-the-art but often surpass the best approaches reported in the literature.
Collapse
|
6
|
Nanni L, Brahnam S, Paci M, Ghidoni S. Comparison of Different Convolutional Neural Network Activation Functions and Methods for Building Ensembles for Small to Midsize Medical Data Sets. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22166129. [PMID: 36015898 PMCID: PMC9415767 DOI: 10.3390/s22166129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 08/09/2022] [Accepted: 08/12/2022] [Indexed: 05/08/2023]
Abstract
CNNs and other deep learners are now state-of-the-art in medical imaging research. However, the small sample size of many medical data sets dampens performance and results in overfitting. In some medical areas, it is simply too labor-intensive and expensive to amass images numbering in the hundreds of thousands. Building Deep CNN ensembles of pre-trained CNNs is one powerful method for overcoming this problem. Ensembles combine the outputs of multiple classifiers to improve performance. This method relies on the introduction of diversity, which can be introduced on many levels in the classification workflow. A recent ensembling method that has shown promise is to vary the activation functions in a set of CNNs or within different layers of a single CNN. This study aims to examine the performance of both methods using a large set of twenty activations functions, six of which are presented here for the first time: 2D Mexican ReLU, TanELU, MeLU + GaLU, Symmetric MeLU, Symmetric GaLU, and Flexible MeLU. The proposed method was tested on fifteen medical data sets representing various classification tasks. The best performing ensemble combined two well-known CNNs (VGG16 and ResNet50) whose standard ReLU activation layers were randomly replaced with another. Results demonstrate the superiority in performance of this approach.
Collapse
Affiliation(s)
- Loris Nanni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy
| | - Sheryl Brahnam
- Department of Information Technology and Cybersecurity, Missouri State University, 901 S. National Street, Springfield, MO 65804, USA
- Correspondence:
| | - Michelangelo Paci
- BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, D 219, FI-33520 Tampere, Finland
| | - Stefano Ghidoni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy
| |
Collapse
|
7
|
Wang G, Xue MQ, Shen HB, Xu YY. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks. Brief Bioinform 2022; 23:6499983. [PMID: 35018423 DOI: 10.1093/bib/bbab539] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/03/2021] [Accepted: 11/20/2021] [Indexed: 11/13/2022] Open
Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein-protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
8
|
Ullah M, Han K, Hadi F, Xu J, Song J, Yu DJ. PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection. Brief Bioinform 2021; 22:bbab278. [PMID: 34337652 PMCID: PMC8574991 DOI: 10.1093/bib/bbab278] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 06/30/2021] [Accepted: 07/01/2021] [Indexed: 01/17/2023] Open
Abstract
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.
Collapse
Affiliation(s)
- Matee Ullah
- Nanjing University of Science and Technology, China
| | - Ke Han
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| | - Fazal Hadi
- Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan
| | - Jian Xu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, China
| |
Collapse
|
9
|
Maurya R, Pathak VK, Dutta MK. Deep learning based microscopic cell images classification framework using multi-level ensemble. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 211:106445. [PMID: 34627021 DOI: 10.1016/j.cmpb.2021.106445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 09/26/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES Advancement of the ultra-fast microscopic images acquisition and generation techniques give rise to the automated artificial intelligence (AI)-based microscopic images classification systems. The earlier cell classification systems classify the cell images of a specific type captured using a specific microscopy technique, therefore the motivation behind the present study is to develop a generic framework that can be used for the classification of cell images of multiple types captured using a variety of microscopic techniques. METHODS The proposed framework for microscopic cell images classification is based on the transfer learning-based multi-level ensemble approach. The ensemble is made by training the same base model with different optimisation methods and different learning rates. An important contribution of the proposed framework lies in its ability to capture different granularities of features extracted from multiple scales of an input microscopic cell image. The base learners used in the proposed ensemble encapsulates the aggregation of low-level coarse features and high-level semantic features, thus, represent the different granular microscopic cell image features present at different scales of input cell images. The batch normalisation layer has been added to the base models for the fast convergence in the proposed ensemble for microscopic cell images classification. RESULTS The general applicability of the proposed framework for microscopic cell image classification has been tested with five different public datasets. The proposed method has outperformed the experimental results obtained in several other similar works. CONCLUSIONS The proposed framework for microscopic cell classification outperforms the other state-of-the-art classification methods in the same domain with a comparatively lesser amount of training data.
Collapse
Affiliation(s)
- Ritesh Maurya
- Centre for Advanced Studies, Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India.
| | | | - Malay Kishore Dutta
- Centre for Advanced Studies, Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India.
| |
Collapse
|
10
|
Chen J, Hou J, Wong KC. Categorical Matrix Completion With Active Learning for High-Throughput Screening. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2261-2270. [PMID: 32203025 DOI: 10.1109/tcbb.2020.2982142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The recent advances in wet-lab automation enable high-throughput experiments to be conducted seamlessly. In particular, the exhaustive enumeration of all possible conditions is always involved in high-throughput screening. Nonetheless, such a screening strategy is hardly believed to be optimal and cost-effective. By incorporating artificial intelligence, we design an open-source model based on categorical matrix completion and active machine learning to guide high throughput screening experiments. Specifically, we narrow our scope to the high-throughput screening for chemical compound effects on diverse protein sub-cellular locations. In the proposed model, we believe that exploration is more important than the exploitation in the long-run of high-throughput screening experiment, Therefore, we design several innovations to circumvent the existing limitations. In particular, categorical matrix completion is designed to accurately impute the missing experiments while margin sampling is also implemented for uncertainty estimation. The model is systematically tested on both simulated and real data. The simulation results reflect that our model can be robust to diverse scenarios, while the real data results demonstrate the wet-lab applicability of our model for high-throughput screening experiments. Lastly, we attribute the model success to its exploration ability by revealing the related matrix ranks and distinct experiment coverage comparisons.
Collapse
|
11
|
Maurya R, Pathak VK, Burget R, Dutta MK. Automated detection of bioimages using novel deep feature fusion algorithm and effective high-dimensional feature selection approach. Comput Biol Med 2021; 137:104862. [PMID: 34534793 DOI: 10.1016/j.compbiomed.2021.104862] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/26/2021] [Accepted: 09/07/2021] [Indexed: 11/30/2022]
Abstract
The classification of bioimages plays an important role in several biological studies, such as subcellular localisation, phenotype identification and other types of histopathological examinations. The objective of the present study was to develop a computer-aided bioimage classification method for the classification of bioimages across nine diverse benchmark datasets. A novel algorithm was developed, which systematically fused the features extracted from nine different convolution neural network architectures. A systematic fusion of features boosts the performance of a classifier but at the cost of the high dimensionality of the fused feature set. Therefore, non-discriminatory and redundant features need to be removed from a high-dimensional fused feature set to improve the classification performance and reduce the time complexity. To achieve this aim, a method based on analysis of variance and evolutionary feature selection was developed to select an optimal set of discriminatory features from the fused feature set. The proposed method was evaluated on nine different benchmark datasets. The experimental results showed that the proposed method achieved superior performance, with a significant reduction in the dimensionality of the fused feature set for most bioimage datasets. The performance of the proposed feature selection method was better than that of some of the most recent and classical methods used for feature selection. Thus, the proposed method was desirable because of its superior performance and high compression ratio, which significantly reduced the computational complexity.
Collapse
Affiliation(s)
- Ritesh Maurya
- Centre for Advanced Studies, Dr A.P.J. Abdul Kalam Technical University, Lucknow, India.
| | | | - Radim Burget
- Department of Telecommunications, Faculty of Electrical Engineering and Communication, BRNO University of Technology, Czech Republic.
| | - Malay Kishore Dutta
- Centre for Advanced Studies, Dr A.P.J. Abdul Kalam Technical University, Lucknow, India.
| |
Collapse
|
12
|
Christopher JA, Stadler C, Martin CE, Morgenstern M, Pan Y, Betsinger CN, Rattray DG, Mahdessian D, Gingras AC, Warscheid B, Lehtiö J, Cristea IM, Foster LJ, Emili A, Lilley KS. Subcellular proteomics. NATURE REVIEWS. METHODS PRIMERS 2021; 1:32. [PMID: 34549195 PMCID: PMC8451152 DOI: 10.1038/s43586-021-00029-y] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/15/2021] [Indexed: 12/11/2022]
Abstract
The eukaryotic cell is compartmentalized into subcellular niches, including membrane-bound and membrane-less organelles. Proteins localize to these niches to fulfil their function, enabling discreet biological processes to occur in synchrony. Dynamic movement of proteins between niches is essential for cellular processes such as signalling, growth, proliferation, motility and programmed cell death, and mutations causing aberrant protein localization are associated with a wide range of diseases. Determining the location of proteins in different cell states and cell types and how proteins relocalize following perturbation is important for understanding their functions, related cellular processes and pathologies associated with their mislocalization. In this Primer, we cover the major spatial proteomics methods for determining the location, distribution and abundance of proteins within subcellular structures. These technologies include fluorescent imaging, protein proximity labelling, organelle purification and cell-wide biochemical fractionation. We describe their workflows, data outputs and applications in exploring different cell biological scenarios, and discuss their main limitations. Finally, we describe emerging technologies and identify areas that require technological innovation to allow better characterization of the spatial proteome.
Collapse
Affiliation(s)
- Josie A. Christopher
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Charlotte Stadler
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Claire E. Martin
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
| | - Marcel Morgenstern
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Yanbo Pan
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Cora N. Betsinger
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - David G. Rattray
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Diana Mahdessian
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Bettina Warscheid
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- BIOSS and CIBSS Signaling Research Centers, University of Freiburg, Freiburg, Germany
| | - Janne Lehtiö
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Ileana M. Cristea
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Leonard J. Foster
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Andrew Emili
- Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Kathryn S. Lilley
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| |
Collapse
|
13
|
Veschini L, Sailem H, Malani D, Pietiäinen V, Stojiljkovic A, Wiseman E, Danovi D. High-Content Imaging to Phenotype Human Primary and iPSC-Derived Cells. Methods Mol Biol 2021; 2185:423-445. [PMID: 33165865 DOI: 10.1007/978-1-0716-0810-4_27] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Increasingly powerful microscopy, liquid handling, and computational techniques have enabled cell imaging in high throughput. Microscopy images are quantified using high-content analysis platforms linking object features to cell behavior. This can be attempted on physiologically relevant cell models, including stem cells and primary cells, in complex environments, and conceivably in the presence of perturbations. Recently, substantial focus has been devoted to cell profiling for cell therapy, assays for drug discovery or biomarker identification for clinical decision-making protocols, bringing this wealth of information into translational applications. In this chapter, we focus on two protocols enabling to (1) benchmark human cells, in particular human endothelial cells as a case study and (2) extract cells from blood for follow-up experiments including image-based drug testing. We also present concepts of high-content imaging and discuss the benefits and challenges, with the aim of enabling readers to tailor existing pipelines and bring such approaches closer to translational research and the clinic.
Collapse
Affiliation(s)
- Lorenzo Veschini
- Academic Centre of Reconstructive Science, Faculty of Dentistry, Oral & Craniofacial Sciences, King's College London, London, UK
| | - Heba Sailem
- The Institute of Biomedical Engineering, Oxford, UK
| | - Disha Malani
- Institute for Molecular Medicine Finland-FIMM, Helsinki Institute of Life Science-HiLIFE, University of Helsinki, Helsinki, Finland
| | - Vilja Pietiäinen
- Institute for Molecular Medicine Finland-FIMM, Helsinki Institute of Life Science-HiLIFE, University of Helsinki, Helsinki, Finland
| | - Ana Stojiljkovic
- Division of Veterinary Anatomy, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Erika Wiseman
- Stem Cell Hotel, Centre for Stem Cells and Regenerative Medicine, King's College London, London, UK
| | - Davide Danovi
- Stem Cell Hotel, Centre for Stem Cells and Regenerative Medicine, King's College London, London, UK.
| |
Collapse
|
14
|
Xu YY, Zhou H, Murphy RF, Shen HB. Consistency and variation of protein subcellular location annotations. Proteins 2020; 89:242-250. [PMID: 32935893 DOI: 10.1002/prot.26010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/09/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022]
Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
Collapse
Affiliation(s)
- Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
15
|
Lundberg E, Borner GHH. Spatial proteomics: a powerful discovery tool for cell biology. Nat Rev Mol Cell Biol 2020; 20:285-302. [PMID: 30659282 DOI: 10.1038/s41580-018-0094-y] [Citation(s) in RCA: 343] [Impact Index Per Article: 68.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein subcellular localization is tightly controlled and intimately linked to protein function in health and disease. Capturing the spatial proteome - that is, the localizations of proteins and their dynamics at the subcellular level - is therefore essential for a complete understanding of cell biology. Owing to substantial advances in microscopy, mass spectrometry and machine learning applications for data analysis, the field is now mature for proteome-wide investigations of spatial cellular regulation. Studies of the human proteome have begun to reveal a complex architecture, including single-cell variations, dynamic protein translocations, changing interaction networks and proteins localizing to multiple compartments. Furthermore, several studies have successfully harnessed the power of comparative spatial proteomics as a discovery tool to unravel disease mechanisms. We are at the beginning of an era in which spatial proteomics finally integrates with cell biology and medical research, thereby paving the way for unbiased systems-level insights into cellular processes. Here, we discuss current methods for spatial proteomics using imaging or mass spectrometry and specifically highlight global comparative applications. The aim of this Review is to survey the state of the field and also to encourage more cell biologists to apply spatial proteomics approaches.
Collapse
Affiliation(s)
- Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden. .,Department of Genetics, Stanford University, Stanford, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA.
| | - Georg H H Borner
- Max Planck Institute of Biochemistry, Department of Proteomics and Signal Transduction, Martinsried, Germany.
| |
Collapse
|
16
|
Yang F, Liu Y, Wang Y, Yin Z, Yang Z. MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy. BMC Bioinformatics 2019; 20:522. [PMID: 31655541 PMCID: PMC6815465 DOI: 10.1186/s12859-019-3136-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 10/09/2019] [Indexed: 12/20/2022] Open
Abstract
Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.
Collapse
Affiliation(s)
- Fan Yang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China. .,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA.
| | - Yang Liu
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Yanbin Wang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| | - Zhen Yang
- School of Communications and Electronics, Jiangxi Science & Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
17
|
Xiang S, Liang Q, Hu Y, Tang P, Coppola G, Zhang D, Sun W. AMC-Net: Asymmetric and multi-scale convolutional neural network for multi-label HPA classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 178:275-287. [PMID: 31416555 DOI: 10.1016/j.cmpb.2019.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/20/2019] [Accepted: 07/06/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVES The multi-label Human Protein Atlas (HPA) classification can yield a better understanding of human diseases and help doctors to enhance the automatic analysis of biomedical images. The existing automatic protein recognition methods have been limited to single pattern. Therefore, an automatic multi-label human protein atlas recognition system with satisfactory performance should be conducted. This work aims to build an automatic recognition system for multi-label human protein atlas classification based on deep learning. METHODS In this work, an automatic feature extraction and multi-label classification framework is proposed. Specifically, an asymmetric and multi-scale convolutional neural network is designed for HPA classification. Furthermore, this work introduces a combined loss that consists of the binary cross-entropy and F1-score losses to improve identification performance. RESULTS Rigorous experiments are conducted to estimate the proposed system. In particular, unlike the current automatic identification systems, which focus on a limited number of patterns, the proposed method is capable of classifying mixed patterns of proteins in microscope images and can handle the subcellular multi-label protein classification task including 28 subcellular localization patterns. The proposed framework based on deep convolutional neural network outperformed the existing approaches with a F1-score of 0.823, which illustrates the robustness and effectiveness of the proposed system. CONCLUSION This study proposed a high-performance recognition system for protein atlas classification based on deep learning, and it achieved an automatic multi-label human protein atlas identification framework with superior performance than previous studies.
Collapse
Affiliation(s)
- Shao Xiang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Qiaokang Liang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China.
| | - Yucheng Hu
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Pen Tang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Gianmarc Coppola
- Faculty of Engineering and Applied Science, University of Ontario Institute of Technology, Oshawa, Ontario, L1H 7K4, Canada
| | - Dan Zhang
- Department of Mechanical Engineering, York University, Toronto, ON M3J 1P3, Canada
| | - Wei Sun
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| |
Collapse
|
18
|
Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat Biotechnol 2018; 36:820-828. [PMID: 30125267 DOI: 10.1038/nbt.4225] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2017] [Accepted: 07/19/2018] [Indexed: 01/11/2023]
Abstract
Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.
Collapse
|
19
|
Godinez WJ, Hossain I, Lazic SE, Davies JW, Zhang X. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics 2018; 33:2010-2019. [PMID: 28203779 DOI: 10.1093/bioinformatics/btx069] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 02/13/2017] [Indexed: 12/27/2022] Open
Abstract
Motivation Identifying phenotypes based on high-content cellular images is challenging. Conventional image analysis pipelines for phenotype identification comprise multiple independent steps, with each step requiring method customization and adjustment of multiple parameters. Results Here, we present an approach based on a multi-scale convolutional neural network (M-CNN) that classifies, in a single cohesive step, cellular images into phenotypes by using directly and solely the images' pixel intensity values. The only parameters in the approach are the weights of the neural network, which are automatically optimized based on training images. The approach requires no a priori knowledge or manual customization, and is applicable to single- or multi-channel images displaying single or multiple cells. We evaluated the classification performance of the approach on eight diverse benchmark datasets. The approach yielded overall a higher classification accuracy compared with state-of-the-art results, including those of other deep CNN architectures. In addition to using the network to simply obtain a yes-or-no prediction for a given phenotype, we use the probability outputs calculated by the network to quantitatively describe the phenotypes. This study shows that these probability values correlate with chemical treatment concentrations. This finding validates further our approach and enables chemical treatment potency estimation via CNNs. Availability and Implementation The network specifications and solver definitions are provided in Supplementary Software 1. Contact william_jose.godinez_navarro@novartis.com or xian-1.zhang@novartis.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- William J Godinez
- Novartis Institutes for BioMedical Research Inc., Basel, Switzerland
| | - Imtiaz Hossain
- Novartis Institutes for BioMedical Research Inc., Basel, Switzerland
| | - Stanley E Lazic
- Novartis Institutes for BioMedical Research Inc., Basel, Switzerland
| | - John W Davies
- Novartis Institutes for BioMedical Research Inc., Cambridge, MA, USA
| | - Xian Zhang
- Novartis Institutes for BioMedical Research Inc., Basel, Switzerland
| |
Collapse
|
20
|
Lin D, Sun L, Toh KA, Zhang JB, Lin Z. Biomedical image classification based on a cascade of an SVM with a reject option and subspace analysis. Comput Biol Med 2018; 96:128-140. [PMID: 29567484 DOI: 10.1016/j.compbiomed.2018.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Revised: 03/07/2018] [Accepted: 03/07/2018] [Indexed: 11/26/2022]
Abstract
Automated biomedical image classification could confront the challenges of high level noise, image blur, illumination variation and complicated geometric correspondence among various categorical biomedical patterns in practice. To handle these challenges, we propose a cascade method consisting of two stages for biomedical image classification. At stage 1, we propose a confidence score based classification rule with a reject option for a preliminary decision using the support vector machine (SVM). The testing images going through stage 1 are separated into two groups based on their confidence scores. Those testing images with sufficiently high confidence scores are classified at stage 1 while the others with low confidence scores are rejected and fed to stage 2. At stage 2, the rejected images from stage 1 are first processed by a subspace analysis technique called eigenfeature regularization and extraction (ERE), and then classified by another SVM trained in the transformed subspace learned by ERE. At both stages, images are represented based on two types of local features, i.e., SIFT and SURF, respectively. They are encoded using various bag-of-words (BoW) models to handle biomedical patterns with and without geometric correspondence, respectively. Extensive experiments are implemented to evaluate the proposed method on three benchmark real-world biomedical image datasets. The proposed method significantly outperforms several competing state-of-the-art methods in terms of classification accuracy.
Collapse
Affiliation(s)
- Dongyun Lin
- School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
| | - Lei Sun
- School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081, PR China
| | - Kar-Ann Toh
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, South Korea
| | - Jing Bo Zhang
- AEBC, Nanyang Environment and Water Research Institute, Nanyang Technological University, Singapore
| | - Zhiping Lin
- School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.
| |
Collapse
|
21
|
Nanni L, Brahnam S, Ghidoni S, Lumini A. Bioimage Classification with Handcrafted and Learned Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:874-885. [PMID: 29994096 DOI: 10.1109/tcbb.2018.2821127] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Bioimage classification is increasingly becoming more important in many biological studies including those that require accurate cell phenotype recognition, subcellular localization, and histopathological classification. In this paper, we present a new General Purpose (GenP) bioimage classification method that can be applied to a large range of classification problems. The GenP system we propose is an ensemble that combines multiple texture features (both handcrafted and learned descriptors) for superior and generalizable discriminative power. Our ensemble obtains a boosting of performance by combining local features, dense sampling features, and deep learning features. Each descriptor is used to train a different Support Vector Machine that is then combined by sum rule. We evaluate our method on a diverse set of bioimage classification tasks each represented by a benchmark database, including some of those available in the IICBU 2008 database. Each bioimage classification task represents a typical subcellular, cellular, and tissue level classification problem. Our evaluation on these datasets demonstrates that the proposed GenP bioimage ensemble obtains state-of-the-art performance without any ad-hoc dataset tuning of the parameters (thereby avoiding any risk of overfitting/overtraining). To reproduce the experiments reported in this paper, the MATLAB code of all the descriptors is available at https://github.com/LorisNanni and https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0.
Collapse
|
22
|
Data-analysis strategies for image-based cell profiling. Nat Methods 2017; 14:849-863. [PMID: 28858338 PMCID: PMC6871000 DOI: 10.1038/nmeth.4397] [Citation(s) in RCA: 436] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/28/2017] [Indexed: 12/16/2022]
Abstract
Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.
Collapse
|
23
|
Song Y, Li Q, Huang H, Feng D, Chen M, Cai W. Low Dimensional Representation of Fisher Vectors for Microscopy Image Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING 2017; 36:1636-1649. [PMID: 28358678 DOI: 10.1109/tmi.2017.2687466] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Microscopy image classification is important in various biomedical applications, such as cancer subtype identification, and protein localization for high content screening. To achieve automated and effective microscopy image classification, the representative and discriminative capability of image feature descriptors is essential. To this end, in this paper, we propose a new feature representation algorithm to facilitate automated microscopy image classification. In particular, we incorporate Fisher vector (FV) encoding with multiple types of local features that are handcrafted or learned, and we design a separation-guided dimension reduction method to reduce the descriptor dimension while increasing its discriminative capability. Our method is evaluated on four publicly available microscopy image data sets of different imaging types and applications, including the UCSB breast cancer data set, MICCAI 2015 CBTC challenge data set, and IICBU malignant lymphoma, and RNAi data sets. Our experimental results demonstrate the advantage of the proposed low-dimensional FV representation, showing consistent performance improvement over the existing state of the art and the commonly used dimension reduction techniques.
Collapse
|
24
|
Song Y, Li Q, Zhang F, Huang H, Feng D, Wang Y, Chen M, Cai W. Dual discriminative local coding for tissue aging analysis. Med Image Anal 2017; 38:65-76. [PMID: 28282641 DOI: 10.1016/j.media.2016.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Revised: 07/12/2016] [Accepted: 10/05/2016] [Indexed: 11/26/2022]
Abstract
In aging research, morphological age of tissue helps to characterize the effects of aging on different individuals. While currently manual evaluations are used to estimate morphological ages under microscopy, such operation is difficult and subjective due to the complex visual characteristics of tissue images. In this paper, we propose an automated method to quantify morphological ages of tissues from microscopy images. We design a new sparse representation method, namely dual discriminative local coding (DDLC), that classifies the tissue images into different chronological ages. DDLC in- corporates discriminative distance learning and dual-level local coding into the basis model of locality-constrained linear coding thus achieves higher discriminative capability. The morphological age is then computed based on the classification scores. We conducted our study using the publicly avail- able terminal bulb aging database that has been commonly used in existing microscopy imaging research. To represent these images, we also design a highly descriptive descriptor that combines several complementary texture features extracted at two scales. Experimental results show that our method achieves significant improvement in age classification when compared to the existing approaches and other popular classifiers. We also present promising results in quantification of morphological ages.
Collapse
Affiliation(s)
- Yang Song
- School of Information Technologies, University of Sydney, Australia.
| | - Qing Li
- School of Information Technologies, University of Sydney, Australia
| | - Fan Zhang
- School of Information Technologies, University of Sydney, Australia
| | - Heng Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, USA
| | - Dagan Feng
- School of Information Technologies, University of Sydney, Australia; Med-X Research Institute, Shanghai Jiaotong University, China
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, USA
| | - Mei Chen
- Computer Engineering Department, University of Albany State University of New York, USA; Robotics Institute, Carnegie Mellon University, USA
| | - Weidong Cai
- School of Information Technologies, University of Sydney, Australia
| |
Collapse
|
25
|
An Overview of data science uses in bioimage informatics. Methods 2017; 115:110-118. [DOI: 10.1016/j.ymeth.2016.12.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 12/09/2016] [Accepted: 12/30/2016] [Indexed: 01/17/2023] Open
|
26
|
Song Y, Cai W, Huang H, Feng D, Wang Y, Chen M. Bioimage classification with subcategory discriminant transform of high dimensional visual descriptors. BMC Bioinformatics 2016; 17:465. [PMID: 27852213 PMCID: PMC5112644 DOI: 10.1186/s12859-016-1318-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bioimage classification is a fundamental problem for many important biological studies that require accurate cell phenotype recognition, subcellular localization, and histopathological classification. In this paper, we present a new bioimage classification method that can be generally applicable to a wide variety of classification problems. We propose to use a high-dimensional multi-modal descriptor that combines multiple texture features. We also design a novel subcategory discriminant transform (SDT) algorithm to further enhance the discriminative power of descriptors by learning convolution kernels to reduce the within-class variation and increase the between-class difference. RESULTS We evaluate our method on eight different bioimage classification tasks using the publicly available IICBU 2008 database. Each task comprises a separate dataset, and the collection represents typical subcellular, cellular, and tissue level classification problems. Our method demonstrates improved classification accuracy (0.9 to 9%) on six tasks when compared to state-of-the-art approaches. We also find that SDT outperforms the well-known dimension reduction techniques, with for example 0.2 to 13% improvement over linear discriminant analysis. CONCLUSIONS We present a general bioimage classification method, which comprises a highly descriptive visual feature representation and a learning-based discriminative feature transformation algorithm. Our evaluation on the IICBU 2008 database demonstrates improved performance over the state-of-the-art for six different classification tasks.
Collapse
Affiliation(s)
- Yang Song
- School of Information Technologies, The University of Sydney, Sydney, Australia
| | - Weidong Cai
- School of Information Technologies, The University of Sydney, Sydney, Australia
| | - Heng Huang
- Department of Computer Science and Engineering, University of Texas, Arlington, USA
| | - Dagan Feng
- School of Information Technologies, The University of Sydney, Sydney, Australia
- Med-X Research Institute, Shanghai Jiaotong University, Shanghai, China
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, USA
| | - Mei Chen
- Computer Engineering Department, University of Albany State University of New York, Albany, USA
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|
27
|
Bougen-Zhukov N, Loh SY, Lee HK, Loo LH. Large-scale image-based screening and profiling of cellular phenotypes. Cytometry A 2016; 91:115-125. [PMID: 27434125 DOI: 10.1002/cyto.a.22909] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Cellular phenotypes are observable characteristics of cells resulting from the interactions of intrinsic and extrinsic chemical or biochemical factors. Image-based phenotypic screens under large numbers of basal or perturbed conditions can be used to study the influences of these factors on cellular phenotypes. Hundreds to thousands of phenotypic descriptors can also be quantified from the images of cells under each of these experimental conditions. Therefore, huge amounts of data can be generated, and the analysis of these data has become a major bottleneck in large-scale phenotypic screens. Here, we review current experimental and computational methods for large-scale image-based phenotypic screens. Our focus is on phenotypic profiling, a computational procedure for constructing quantitative and compact representations of cellular phenotypes based on the images collected in these screens. © 2016 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Nicola Bougen-Zhukov
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Sheng Yang Loh
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Hwee Kuan Lee
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore
| | - Lit-Hsin Loo
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671, Singapore.,Department of Pharmacology, School of Medicine, National University of Singapore, Singapore, 117600, Singapore
| |
Collapse
|
28
|
Fricker MD, Moger J, Littlejohn GR, Deeks MJ. Making microscopy count: quantitative light microscopy of dynamic processes in living plants. J Microsc 2016; 263:181-91. [PMID: 27145353 DOI: 10.1111/jmi.12403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 01/31/2016] [Accepted: 02/16/2016] [Indexed: 12/18/2022]
Abstract
Cell theory has officially reached 350 years of age as the first use of the word 'cell' in a biological context can be traced to a description of plant material by Robert Hooke in his historic publication 'Micrographia: or some physiological definitions of minute bodies'. The 2015 Royal Microscopical Society Botanical Microscopy meeting was a celebration of the streams of investigation initiated by Hooke to understand at the subcellular scale how plant cell function and form arises. Much of the work presented, and Honorary Fellowships awarded, reflected the advanced application of bioimaging informatics to extract quantitative data from micrographs that reveal dynamic molecular processes driving cell growth and physiology. The field has progressed from collecting many pixels in multiple modes to associating these measurements with objects or features that are meaningful biologically. The additional complexity involves object identification that draws on a different type of expertise from computer science and statistics that is often impenetrable to biologists. There are many useful tools and approaches being developed, but we now need more interdisciplinary exchange to use them effectively. In this review we show how this quiet revolution has provided tools available to any personal computer user. We also discuss the oft-neglected issue of quantifying algorithm robustness and the exciting possibilities offered through the integration of physiological information generated by biosensors with object detection and tracking.
Collapse
Affiliation(s)
- Mark D Fricker
- Department of Plant Sciences, University of Oxford, Oxford, U.K
| | - Julian Moger
- Department of Physics, University of Exeter, Exeter, Devon, U.K
| | | | - Michael J Deeks
- Department of Biosciences, University of Exeter, Exeter, Devon, U.K
| |
Collapse
|
29
|
Donovan RM, Tapia JJ, Sullivan DP, Faeder JR, Murphy RF, Dittrich M, Zuckerman DM. Unbiased Rare Event Sampling in Spatial Stochastic Systems Biology Models Using a Weighted Ensemble of Trajectories. PLoS Comput Biol 2016; 12:e1004611. [PMID: 26845334 PMCID: PMC4741515 DOI: 10.1371/journal.pcbi.1004611] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 10/16/2015] [Indexed: 12/25/2022] Open
Abstract
The long-term goal of connecting scales in biological simulation can be facilitated by scale-agnostic methods. We demonstrate that the weighted ensemble (WE) strategy, initially developed for molecular simulations, applies effectively to spatially resolved cell-scale simulations. The WE approach runs an ensemble of parallel trajectories with assigned weights and uses a statistical resampling strategy of replicating and pruning trajectories to focus computational effort on difficult-to-sample regions. The method can also generate unbiased estimates of non-equilibrium and equilibrium observables, sometimes with significantly less aggregate computing time than would be possible using standard parallelization. Here, we use WE to orchestrate particle-based kinetic Monte Carlo simulations, which include spatial geometry (e.g., of organelles, plasma membrane) and biochemical interactions among mobile molecular species. We study a series of models exhibiting spatial, temporal and biochemical complexity and show that although WE has important limitations, it can achieve performance significantly exceeding standard parallel simulation—by orders of magnitude for some observables. Stochastic simulations (simulations where randomness plays a role) of even simple biological systems are often so computationally intensive that it is impossible, in practice, to simulate them exhaustively and gather good statistics about the likelihood of different outcomes. The difficulty is compounded for the observation of rare events in these simulations; unfortunately, rare events, such as state transitions and barrier crossings, are often those of particular interest. Using the weighted ensemble (WE) method, we are able to enhance the characterization of rare events in cell biology simulations, but in such a way that the statistics for these events remain unbiased. The histogram of outcomes that WE produces has the same shape as a naive one, but the resolution of events in the tails of the histogram is greatly improved. This improved resolution in rare event statistics can be used to infer unbiased estimates of long timescale dynamics from short simulations, and we show that using a weighted ensemble can result in a reduction in total simulation time needed to sample certain events of interest in spatial, stochastic models of biological systems.
Collapse
Affiliation(s)
- Rory M. Donovan
- Joint CMU-Pitt Ph.D. Program in Computational Biology, Pittsburgh, Pennsylvania, United States of America
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jose-Juan Tapia
- Joint CMU-Pitt Ph.D. Program in Computational Biology, Pittsburgh, Pennsylvania, United States of America
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Devin P. Sullivan
- Joint CMU-Pitt Ph.D. Program in Computational Biology, Pittsburgh, Pennsylvania, United States of America
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - James R. Faeder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Robert F. Murphy
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Markus Dittrich
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Supercomputing Center, Pittsburgh, Pennsylvania, United States of America
| | - Daniel M. Zuckerman
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
30
|
Naik AW, Kangas JD, Sullivan DP, Murphy RF. Active machine learning-driven experimentation to determine compound effects on protein patterns. eLife 2016; 5:e10047. [PMID: 26840049 PMCID: PMC4798950 DOI: 10.7554/elife.10047] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 01/28/2016] [Indexed: 12/03/2022] Open
Abstract
High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance. DOI:http://dx.doi.org/10.7554/eLife.10047.001 Biomedical scientists have invested significant effort into making it easy to perform lots of experiments quickly and cheaply. These “high throughput” methods are the workhorses of modern “systems biology” efforts. However, we simply cannot perform an experiment for every possible combination of different cell type, genetic mutation and other conditions. In practice this has led researchers to either exhaustively test a few conditions or targets, or to try to pick the experiments that best allow a particular problem to be explored. But which experiments should we pick? The ones we think we can predict the outcome of accurately, the ones for which we are uncertain what the results will be, or a combination of the two? Humans are not particularly well suited for this task because it requires reasoning about many possible outcomes at the same time. However, computers are much better at handling statistics for many experiments, and machine learning algorithms allow computers to “learn” how to make predictions and decisions based on the data they’ve previously processed. Previous computer simulations showed that a machine learning approach termed “active learning” could do a good job of picking a series of experiments to perform in order to efficiently learn a model that predicts the results of experiments that were not done. Now, Naik et al. have performed cell biology experiments in which experiments were chosen by an active learning algorithm and then performed using liquid handling robots and an automated microscope. The key idea behind the approach is that you learn more from an experiment you can’t predict (or that you predicted incorrectly) than from just confirming your confident predictions. The results of the robot-driven experiments showed that the active learning approach outperforms strategies a human might use, even when the potential outcomes of individual experiments are not known beforehand. The next challenge is to apply these methods to reduce the cost of achieving the goals of large projects, such as The Cancer Genome Atlas. DOI:http://dx.doi.org/10.7554/eLife.10047.002
Collapse
Affiliation(s)
- Armaghan W Naik
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, United States.,Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, United States
| | - Joshua D Kangas
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, United States.,Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, United States
| | - Devin P Sullivan
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, United States.,Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, United States
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, United States.,Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, United States.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, United States.,Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, United States.,Machine Learning Department, Carnegie Mellon University, Pittsburgh, United States.,Freiburg Institute for Advanced Studies, Albert Ludwig University of Freiburg, Freiburg, Germany.,Faculty of Biology, Albert Ludwig University of Freiburg, Freiburg, Germany
| |
Collapse
|
31
|
CP-CHARM: segmentation-free image classification made accessible. BMC Bioinformatics 2016; 17:51. [PMID: 26817459 PMCID: PMC4729047 DOI: 10.1186/s12859-016-0895-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 01/18/2016] [Indexed: 11/10/2022] Open
Abstract
Background Automated classification using machine learning often relies on features derived from segmenting individual objects, which can be difficult to automate. WND-CHARM is a previously developed classification algorithm in which features are computed on the whole image, thereby avoiding the need for segmentation. The algorithm obtained encouraging results but requires considerable computational expertise to execute. Furthermore, some benchmark sets have been shown to be subject to confounding artifacts that overestimate classification accuracy. Results We developed CP-CHARM, a user-friendly image-based classification algorithm inspired by WND-CHARM in (i) its ability to capture a wide variety of morphological aspects of the image, and (ii) the absence of requirement for segmentation. In order to make such an image-based classification method easily accessible to the biological research community, CP-CHARM relies on the widely-used open-source image analysis software CellProfiler for feature extraction. To validate our method, we reproduced WND-CHARM’s results and ensured that CP-CHARM obtained comparable performance. We then successfully applied our approach on cell-based assay data and on tissue images. We designed these new training and test sets to reduce the effect of batch-related artifacts. Conclusions The proposed method preserves the strengths of WND-CHARM - it extracts a wide variety of morphological features directly on whole images thereby avoiding the need for cell segmentation, but additionally, it makes the methods easily accessible for researchers without computational expertise by implementing them as a CellProfiler pipeline. It has been demonstrated to perform well on a wide range of bioimage classification problems, including on new datasets that have been carefully selected and annotated to minimize batch effects. This provides for the first time a realistic and reliable assessment of the whole image classification strategy. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0895-y) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
Yang Q, Zou HY, Zhang Y, Tang LJ, Shen GL, Jiang JH, Yu RQ. Multiplex protein pattern unmixing using a non-linear variable-weighted support vector machine as optimized by a particle swarm optimization algorithm. Talanta 2016; 147:609-14. [DOI: 10.1016/j.talanta.2015.10.047] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 10/14/2015] [Accepted: 10/18/2015] [Indexed: 11/30/2022]
|
33
|
Shao W, Liu M, Zhang D. Human cell structure-driven model construction for predicting protein subcellular location from biological images. Bioinformatics 2015; 32:114-21. [PMID: 26363175 DOI: 10.1093/bioinformatics/btv521] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 08/31/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. RESULTS We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. AVAILABILITY AND IMPLEMENTATION The dataset and code can be downloaded from https://github.com/shaoweinuaa/. CONTACT dqzhang@nuaa.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Shao
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Mingxia Liu
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Daoqiang Zhang
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| |
Collapse
|
34
|
Coelho LP, Pato C, Friães A, Neumann A, von Köckritz-Blickwede M, Ramirez M, Carriço JA. Automatic determination of NET (neutrophil extracellular traps) coverage in fluorescent microscopy images. Bioinformatics 2015; 31:2364-70. [PMID: 25792554 DOI: 10.1093/bioinformatics/btv156] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 02/16/2015] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Neutrophil extracellular traps (NETs) are believed to be essential in controlling several bacterial pathogens. Quantification of NETs in vitro is an important tool in studies aiming to clarify the biological and chemical factors contributing to NET production, stabilization and degradation. This estimation can be performed on the basis of fluorescent microscopy images using appropriate labelings. In this context, it is desirable to automate the analysis to eliminate both the tedious process of manual annotation and possible operator-specific biases. RESULTS We propose a framework for the automated determination of NET content, based on visually annotated images which are used to train a supervised machine-learning method. We derive several methods in this framework. The best results are obtained by combining these into a single prediction. The overall Q(2) of the combined method is 93%. By having two experts label part of the image set, we were able to compare the performance of the algorithms to the human interoperator variability. We find that the two operators exhibited a very high correlation on their overall assessment of the NET coverage area in the images (R(2) is 97%), although there were consistent differences in labeling at pixel level (Q(2), which unlike R(2) does not correct for additive and multiplicative biases, was only 89%). AVAILABILITY AND IMPLEMENTATION Open source software (under the MIT license) is available at https://github.com/luispedro/Coelho2015_NetsDetermination for both reproducibility and application to new data.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| | - Catarina Pato
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| | - Ana Friães
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| | - Ariane Neumann
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| | | | - Mário Ramirez
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| | - João André Carriço
- Unidade de Biofísica e Expressão Genética, Instituto de Medicina Molecular and
| |
Collapse
|
35
|
Krauß SD, Petersen D, Niedieker D, Fricke I, Freier E, El-Mashtoly SF, Gerwert K, Mosig A. Colocalization of fluorescence and Raman microscopic images for the identification of subcellular compartments: a validation study. Analyst 2015; 140:2360-8. [DOI: 10.1039/c4an02153c] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
This paper introduces algorithms for identifying overlapping observations between Raman and fluorescence microscopic images of one and the same sample.
Collapse
Affiliation(s)
- Sascha D. Krauß
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | - Dennis Petersen
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | - Daniel Niedieker
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | - Inka Fricke
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | - Erik Freier
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | | | - Klaus Gerwert
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| | - Axel Mosig
- Department of Biophysics
- Ruhr-University Bochum
- 44780 Bochum
- Germany
| |
Collapse
|
36
|
Abbas SS, Dijkstra TMH, Heskes T. A comparative study of cell classifiers for image-based high-throughput screening. BMC Bioinformatics 2014; 15:342. [PMID: 25336059 PMCID: PMC4287552 DOI: 10.1186/1471-2105-15-342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 09/29/2014] [Indexed: 11/24/2022] Open
Abstract
Background Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells. Results For the HT29 data set we find that gentle boosting, SVM (linear) and SVM (RBF) are close in performance but SVM (linear) is faster than gentle boosting and SVM (RBF). For the HT29 data set the average performance difference between SVM (RBF) and SVM (linear) is 0.42 %. For the HeLa data set we find that SVM (RBF) outperforms other classification methods and is on average 1.41 % better in performance than SVM (linear). Conclusions Our study proposes SVM (linear) for iterative improvement of the training data and SVM (RBF) for the final classifier to classify all unlabeled cells in the whole data set. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-342) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Syed Saiden Abbas
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, Netherlands.
| | | | | |
Collapse
|
37
|
Yang F, Xu YY, Shen HB. Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification? ScientificWorldJournal 2014; 2014:429049. [PMID: 25050396 PMCID: PMC4094881 DOI: 10.1155/2014/429049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/22/2014] [Indexed: 01/14/2023] Open
Abstract
Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
Collapse
Affiliation(s)
- Fan Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Optic-Electronic and Communication, Jiangxi Science & Technology Normal University, Nanchang 330013, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Ying-Ying Xu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
38
|
Yang F, Xu YY, Wang ST, Shen HB. Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.10.034] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
39
|
Predicting human protein subcellular locations by the ensemble of multiple predictors via protein-protein interaction network with edge clustering coefficients. PLoS One 2014; 9:e86879. [PMID: 24466278 PMCID: PMC3900678 DOI: 10.1371/journal.pone.0086879] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 12/18/2013] [Indexed: 12/14/2022] Open
Abstract
One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods.
Collapse
|