1
|
Rose M, Burgess JT, Cheong CM, Adams MN, Shahrouzi P, O’Byrne KJ, Richard DJ, Bolderson E. The expression and role of the Lem-D proteins Ankle2, Emerin, Lemd2, and TMPO in triple-negative breast cancer cell growth. Front Oncol 2024; 14:1222698. [PMID: 38720803 PMCID: PMC11076778 DOI: 10.3389/fonc.2024.1222698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/28/2024] [Indexed: 05/12/2024] Open
Abstract
Background Triple-negative breast cancer (TNBC) is a sub-classification of breast carcinomas, which leads to poor survival outcomes for patients. TNBCs do not possess the hormone receptors that are frequently targeted as a therapeutic in other cancer subtypes and, therefore, chemotherapy remains the standard treatment for TNBC. Nuclear envelope proteins are frequently dysregulated in cancer cells, supporting their potential as novel cancer therapy targets. The Lem-domain (Lem-D) (LAP2, Emerin, MAN1 domain, and Lem-D) proteins are a family of inner nuclear membrane proteins, which share a ~45-residue Lem-D. The Lem-D proteins, including Ankle2, Lemd2, TMPO, and Emerin, have been shown to be associated with many of the hallmarks of cancer. This study aimed to define the association between the Lem-D proteins and TNBC and determine whether these proteins could be promising therapeutic targets. Methods GENT2, TCGA, and KM plotter were utilized to investigate the expression and prognostic implications of several Lem-D proteins: Ankle2, TMPO, Emerin, and Lemd2 in publicly available breast cancer patient data. Immunoblotting and immunofluorescent analysis of immortalized non-cancerous breast cells and a panel of TNBC cells were utilized to establish whether protein expression of the Lem-D proteins was significantly altered in TNBC. SiRNA was used to decrease individual Lem-D protein expression, and functional assays, including proliferation assays and apoptosis assays, were conducted. Results The Lem-D proteins were generally overexpressed in TNBC patient samples at the mRNA level and showed variable expression at the protein level in TNBC cell lysates. Similarly, protein levels were generally negatively correlated with patient survival outcomes. siRNA-mediated depletion of the individual Lem-D proteins in TNBC cells induced aberrant nuclear morphology, decreased proliferation, and induced cell death. However, minimal effects on nuclear morphology or cell viability were observed following Lem-D depletion in non-cancerous MCF10A cells. Conclusion There is evidence to suggest that Ankle2, TMPO, Emerin, and Lemd2 expressions are correlated with breast cancer patient outcomes, but larger patient sample numbers are required to confirm this. siRNA-mediated depletion of these proteins was shown to specifically impair TNBC cell growth, suggesting that the Lem-D proteins may be a specific anti-cancer target.
Collapse
Affiliation(s)
- Maddison Rose
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Joshua T. Burgess
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Chee Man Cheong
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Mark N. Adams
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Parastoo Shahrouzi
- Department of Medical Genetics, Faculty of Medicine, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Kenneth J. O’Byrne
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
- Cancer Services, Princess Alexandra Hospital, Brisbane, QLD, Australia
| | - Derek J. Richard
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| | - Emma Bolderson
- Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Translational Research Institute, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
2
|
Bao LX, Luo ZM, Zhu XL, Xu YY. Automated identification of protein expression intensity and classification of protein cellular locations in mouse brain regions from immunofluorescence images. Med Biol Eng Comput 2024; 62:1105-1119. [PMID: 38150111 DOI: 10.1007/s11517-023-02985-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/28/2023] [Indexed: 12/28/2023]
Abstract
Knowledge of protein expression in mammalian brains at regional and cellular levels can facilitate understanding of protein functions and associated diseases. As the mouse brain is a typical mammalian brain considering cell type and structure, several studies have been conducted to analyze protein expression in mouse brains. However, labeling protein expression using biotechnology is costly and time-consuming. Therefore, automated models that can accurately recognize protein expression are needed. Here, we constructed machine learning models to automatically annotate the protein expression intensity and cellular location in different mouse brain regions from immunofluorescence images. The brain regions and sub-regions were segmented through learning image features using an autoencoder and then performing K-means clustering and registration to align with the anatomical references. The protein expression intensities for those segmented structures were computed on the basis of the statistics of the image pixels, and patch-based weakly supervised methods and multi-instance learning were used to classify the cellular locations. Results demonstrated that the models achieved high accuracy in the expression intensity estimation, and the F1 score of the cellular location prediction was 74.5%. This work established an automated pipeline for analyzing mouse brain images and provided a foundation for further study of protein expression and functions.
Collapse
Affiliation(s)
- Lin-Xia Bao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Zhuo-Ming Luo
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Xi-Liang Zhu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China.
| |
Collapse
|
3
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
4
|
Xue ZZ, Li C, Luo ZM, Wang SS, Xu YY. Automated classification of protein expression levels in immunohistochemistry images to improve the detection of cancer biomarkers. BMC Bioinformatics 2022; 23:470. [DOI: 10.1186/s12859-022-05015-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/29/2022] [Indexed: 11/09/2022] Open
Abstract
Abstract
Background
The expression changes of some proteins are associated with cancer progression, and can be used as biomarkers in cancer diagnosis. Automated systems have been frequently applied in the large-scale detection of protein biomarkers and have provided a valuable complement for wet-laboratory experiments. For example, our previous work used an immunohistochemical image-based machine learning classifier of protein subcellular locations to screen biomarker proteins that change locations in colon cancer tissues. The tool could recognize the location of biomarkers but did not consider the effect of protein expression level changes on the screening process.
Results
In this study, we built an automated classification model that recognizes protein expression levels in immunohistochemical images, and used the protein expression levels in combination with subcellular locations to screen cancer biomarkers. To minimize the effect of non-informative sections on the immunohistochemical images, we employed the representative image patches as input and applied a Wasserstein distance method to determine the number of patches. For the patches and the whole images, we compared the ability of color features, characteristic curve features, and deep convolutional neural network features to distinguish different levels of protein expression and employed deep learning and conventional classification models. Experimental results showed that the best classifier can achieve an accuracy of 73.72% and an F1-score of 0.6343. In the screening of protein biomarkers, the detection accuracy improved from 63.64 to 95.45% upon the incorporation of the protein expression changes.
Conclusions
Machine learning can distinguish different protein expression levels and speed up their annotation in the future. Combining information on the expression patterns and subcellular locations of protein can improve the accuracy of automatic cancer biomarker screening. This work could be useful in discovering new cancer biomarkers for clinical diagnosis and research.
Collapse
|
5
|
Hu JX, Yang Y, Xu YY, Shen HB. GraphLoc: a graph neural network model for predicting protein subcellular localization from immunohistochemistry images. Bioinformatics 2022; 38:4941-4948. [DOI: 10.1093/bioinformatics/btac634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/07/2022] [Accepted: 09/15/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Recognition of protein subcellular distribution patterns and identification of location biomarker proteins in cancer tissues are important for understanding protein functions and related diseases. Immunohistochemical (IHC) images enable visualizing the distribution of proteins at the tissue level, providing an important resource for the protein localization studies. In the past decades, several image-based protein subcellular location prediction methods have been developed, but the prediction accuracies still have much space to improve due to the complexity of protein patterns resulting from multi-label proteins and variation of location patterns across cell types or states.
Results
Here, we propose a multi-label multi-instance model based on deep graph convolutional neural networks, GraphLoc, to recognize protein subcellular location patterns. GraphLoc builds a graph of multiple IHC images for one protein, learns protein-level representations by graph convolutions, and predicts multi-label information by a dynamic threshold method. Our results show that GraphLoc is a promising model for image-based protein subcellular location prediction with model interpretability. Furthermore, we apply GraphLoc to the identification of candidate location biomarkers and potential members for protein networks. A large portion of the predicted results have supporting evidence from the existing literatures and the new candidates also provide guidance for further experimental screening.
Availability
The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/GraphLoc.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin-Xian Hu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing , Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Shanghai Jiao Tong University Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, , Shanghai 200240, China
| | - Ying-Ying Xu
- Southern Medical University School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, , Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University , Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing , Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
6
|
Tu Y, Lei H, Shen HB, Yang Y. SIFLoc: a self-supervised pre-training method for enhancing the recognition of protein subcellular localization in immunofluorescence microscopic images. Brief Bioinform 2022; 23:6527276. [DOI: 10.1093/bib/bbab605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 12/19/2022] Open
Abstract
Abstract
With the rapid growth of high-resolution microscopy imaging data, revealing the subcellular map of human proteins has become a central task in the spatial proteome. The cell atlas of the Human Protein Atlas (HPA) provides precious resources for recognizing subcellular localization patterns at the cell level, and the large-scale annotated data enable learning via advanced deep neural networks. However, the existing predictors still suffer from the imbalanced class distribution and the lack of labeled data for minor classes. Thus, it is necessary to develop new methods for coping with these issues. We leverage the self-supervised learning protocol to address these problems. Especially, we propose a pre-training scheme to enhance the conventional supervised learning framework called SIFLoc. The pre-training is featured by a hybrid data augmentation method and a modified contrastive loss function, aiming to learn good feature representations from microscopic images. The experiments are performed on a large-scale immunofluorescence microscopic image dataset collected from the HPA database. Using the same deep neural networks as the classifier, the model pre-trained via SIFLoc not only outperforms the model without pre-training by a large margin but also shows advantages over the state-of-the-art self-supervised learning methods. Especially, SIFLoc improves the prediction accuracy for minor organelles significantly.
Collapse
Affiliation(s)
- Yanlun Tu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Houchao Lei
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Hong-Bin Shen
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| |
Collapse
|
7
|
Wang G, Xue MQ, Shen HB, Xu YY. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks. Brief Bioinform 2022; 23:6499983. [PMID: 35018423 DOI: 10.1093/bib/bbab539] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/03/2021] [Accepted: 11/20/2021] [Indexed: 11/13/2022] Open
Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein-protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
8
|
Hu JX, Yang Y, Xu YY, Shen HB. Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images. Proteins 2021; 90:493-503. [PMID: 34546597 DOI: 10.1002/prot.26244] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 03/16/2021] [Accepted: 09/13/2021] [Indexed: 12/17/2022]
Abstract
Analysis of protein subcellular localization is a critical part of proteomics. In recent years, as both the number and quality of microscopic images are increasing rapidly, many automated methods, especially convolutional neural networks (CNN), have been developed to predict protein subcellular location(s) based on bioimages, but their performance always suffers from some inherent properties of the problem. First, many microscopic images have non-informative or noisy sections, like unstained stroma and unspecific background, which affect the extraction of protein expression information. Second, the patterns of protein subcellular localization are very complex, as a lot of proteins locate in more than one compartment. In this study, we propose a new label-correlation enhanced deep neural network, laceDNN, to classify the subcellular locations of multi-label proteins from immunohistochemistry images. The model uses small representative patches as input to alleviate the image noise issue, and its backbone is a hybrid architecture of CNN and recurrent neural network, where the former network extracts representative image features and the latter learns the organelle dependency relationships. Our experimental results indicate that the proposed model can improve the performance of multi-label protein subcellular classification.
Collapse
Affiliation(s)
- Jin-Xian Hu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
9
|
Li N, Huang H, Linsheng L, Lu H, Liu X. Improving glomerular filtration rate estimation by semi-supervised learning: a development and external validation study. Int Urol Nephrol 2021; 53:1649-1658. [PMID: 33710531 DOI: 10.1007/s11255-020-02771-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 12/21/2020] [Indexed: 11/24/2022]
Abstract
BACKGROUND Accurate estimating glomerular filtration rate (GFR) is crucial both in clinical practice and epidemiological survey. We incorporated semi-supervised learning technology to improve GFR estimation performance. METHODS AASK [African American Study of Kidney Disease and Hypertension], CRIC [Chronic Renal Insufficiency Cohort] and DCCT [Diabetes Control and Complications Trial] studies were pooled together for model development, whereas MDRD [Modification of Diet in Renal Disease] and CRISP [Consortium for Radiological Imaging Studies of Polycystic Kidney Disease] studies for model external validation. A total of seven variables (Serum creatinine, Age, Sex, Black race, Diabetes status, Hypertension and Body Mass Index) were included as independent variables, while the outcome variable GFR was measured as the urinary clearance of 125I-iothalamate. The revised CKD-EPI [Chronic Kidney Disease Epidemiology Collaboration] creatinine equations was selected as benchmark for performance comparisons. Head-to-head performance comparisons from four-variable to seven-variable combination were conducted between revised CKD-EPI equations and semi-supervised models. RESULTS In each independent variables combination, the semi-supervised models consistently achieved superior results in all three performance indicators compared with corresponding revised CKD-EPI equations in the external validation data set. Furthermore, compared with revised four-variable CKD-EPI equation, the seven-variable semi-supervised model performed less biased (mean of difference: 0.03 [- 0.28, 0.34] vs 1.53 [1.28, 1.85], P < 0.001), more precise (interquartile range of difference: 7.94 [7.37, 8.50] vs 8.28 [7.76, 8.83], P = 0.1) and accurate (P30: 88.9% [87.4%, 90.2%] vs 86.0% [84.4%, 87.4%], P < 0.001. CONCLUSIONS The superior performance of the semi-supervised models during head-to-head comparisons supported the hypothesis that semi-supervised learning technology could improve GFR estimation performance.
Collapse
Affiliation(s)
- Ningshan Li
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hui Huang
- Cardiovascular Department, The Eighth Affiliated Hospital, Sun Yat-Sen University, Shenzhen, China
| | - Lv Linsheng
- Operation Room, The Third Affiliated Hospital of Sun Yat-Sen University, Guangdong, China
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China.
| | - Xun Liu
- Clinical Data Center of the Third Affiliated Hospital of Sun Yat-Sen University, Guangdong, China.
- Division of Nephrology, Department of Internal Medicine, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510630, Guangdong, China.
| |
Collapse
|
10
|
Abstract
The elucidation of the subcellular localization of proteins is very important in order to deeply understand their functions. In fact, proteins activities are strictly correlated to the cellular compartment and microenvironment in which they are present.In recent years, several effective and reliable proteomics techniques and computational methods have been developed and implemented in order to identify the proteins subcellular localization. This process is often time-consuming and expensive, but the recent technological and bioinformatics progress allowed the development of more accurate and simple workflows to determine the localization, interactions, and functions of proteins.In the following chapter, a brief introduction on the importance of knowing subcellular localization of proteins will be presented. Then, sample preparation protocols, proteomic methods, data analysis strategies, and software for the prediction of proteins localization will be presented and discussed. Finally, the more recent and advanced spatial proteomics techniques will be shown.
Collapse
Affiliation(s)
- Elettra Barberis
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Emilio Marengo
- Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Marcello Manfredi
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy.
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy.
| |
Collapse
|
11
|
Schormann W, Hariharan S, Andrews DW. A reference library for assigning protein subcellular localizations by image-based machine learning. J Cell Biol 2020; 219:133635. [PMID: 31968357 PMCID: PMC7055006 DOI: 10.1083/jcb.201904090] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/30/2019] [Accepted: 12/15/2019] [Indexed: 12/11/2022] Open
Abstract
Confocal micrographs of EGFP fusion proteins localized at key cell organelles in murine and human cells were acquired for use as subcellular localization landmarks. For each of the respective 789,011 and 523,319 optically validated cell images, morphology and statistical features were measured. Machine learning algorithms using these features permit automated assignment of the localization of other proteins and dyes in both cell types with very high accuracy. Automated assignment of subcellular localizations for model tail-anchored proteins with randomly mutated C-terminal targeting sequences allowed the discovery of motifs responsible for targeting to mitochondria, endoplasmic reticulum, and the late secretory pathway. Analysis of directed mutants enabled refinement of these motifs and characterization of protein distributions in within cellular subcompartments.
Collapse
Affiliation(s)
- Wiebke Schormann
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada
| | | | - David W Andrews
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada.,Department of Biochemistry, University of Toronto, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
12
|
Camargo G, Bugatti PH, Saito PTM. Active semi-supervised learning for biological data classification. PLoS One 2020; 15:e0237428. [PMID: 32813738 PMCID: PMC7437865 DOI: 10.1371/journal.pone.0237428] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 07/27/2020] [Indexed: 11/18/2022] Open
Abstract
Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations provided by a specialist and the need for a significant amount of annotated data to obtain a robust classifier. In this context, active learning techniques jointly with semi-supervised learning are interesting. A smaller number of more informative samples previously selected (by the active learning strategy) and labeled by a specialist can propagate the labels to a set of unlabeled data (through the semi-supervised one). However, most of the literature works neglect the need for interactive response times that can be required by certain real applications. We propose a more effective and efficient active semi-supervised learning framework, including a new active learning method. An extensive experimental evaluation was performed in the biological context (using the ALL-AML, Escherichia coli and PlantLeaves II datasets), comparing our proposals with state-of-the-art literature works and different supervised (SVM, RF, OPF) and semi-supervised (YATSI-SVM, YATSI-RF and YATSI-OPF) classifiers. From the obtained results, we can observe the benefits of our framework, which allows the classifier to achieve higher accuracies more quickly with a reduced number of annotated samples. Moreover, the selection criterion adopted by our active learning method, based on diversity and uncertainty, enables the prioritization of the most informative boundary samples for the learning process. We obtained a gain of up to 20% against other learning techniques. The active semi-supervised learning approaches presented a better trade-off (accuracies and competitive and viable computational times) when compared with the active supervised learning ones.
Collapse
Affiliation(s)
- Guilherme Camargo
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| | - Pedro H. Bugatti
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| | - Priscila T. M. Saito
- Department of Computing, Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
- Institute of Computing, University of Campinas, Campinas, SP, Brazil
| |
Collapse
|
13
|
Nagao Y, Sakamoto M, Chinen T, Okada Y, Takao D. Robust classification of cell cycle phase and biological feature extraction by image-based deep learning. Mol Biol Cell 2020; 31:1346-1354. [PMID: 32320349 PMCID: PMC7353138 DOI: 10.1091/mbc.e20-03-0187] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Across the cell cycle, the subcellular organization undergoes major spatiotemporal changes that could in principle contain biological features that could potentially represent cell cycle phase. We applied convolutional neural network-based classifiers to extract such putative features from the fluorescence microscope images of cells stained for the nucleus, the Golgi apparatus, and the microtubule cytoskeleton. We demonstrate that cell images can be robustly classified according to G1/S and G2 cell cycle phases without the need for specific cell cycle markers. Grad-CAM analysis of the classification models enabled us to extract several pairs of quantitative parameters of specific subcellular features as good classifiers for the cell cycle phase. These results collectively demonstrate that machine learning-based image processing is useful to extract biological features underlying cellular phenomena of interest in an unbiased and data-driven manner.
Collapse
Affiliation(s)
- Yukiko Nagao
- Faculty of Pharmaceutical Sciences, The University of Tokyo, Tokyo 113-0033, Japan
| | - Mika Sakamoto
- Genome Informatics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan
| | - Takumi Chinen
- Faculty of Pharmaceutical Sciences, The University of Tokyo, Tokyo 113-0033, Japan
| | - Yasushi Okada
- Department of Cell Biology and Anatomy and International Research Center for Neurointelligence (WPI-IRCN), Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan.,Department of Physics and Universal Biology Institute (UBI), Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan.,Laboratory for Cell Polarity Regulation, Center for Biosystems Dynamics Research (BDR), RIKEN, Osaka 565-0874, Japan
| | - Daisuke Takao
- Department of Cell Biology and Anatomy and International Research Center for Neurointelligence (WPI-IRCN), Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
14
|
Abstract
Background MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab validation experiments, making it a resource-intensive procedure. While numerous machine learning methods have been developed to increase classification accuracy and thus reduce validation costs, most methods use supervised learning and thus require large labeled training data sets, often not feasible for less-sequenced species. On the other hand, there is now an abundance of unlabeled RNA sequence data due to the emergence of high-throughput wet-lab experimental procedures, such as next-generation sequencing. Results This paper explores the application of semi-supervised machine learning for miRNA classification in order to maximize the utility of both labeled and unlabeled data. We here present the novel combination of two semi-supervised approaches: active learning and multi-view co-training. Results across six diverse species show that this multi-stage semi-supervised approach is able to improve classification performance using very small numbers of labeled instances, effectively leveraging the available unlabeled data. Conclusions The proposed semi-supervised miRNA classification pipeline holds the potential to identify novel miRNA with high recall and precision while requiring very small numbers of previously known miRNA. Such a method could be highly beneficial when studying miRNA in newly sequenced genomes of niche species with few known examples of miRNA.
Collapse
|
15
|
Multi-view Co-training for microRNA Prediction. Sci Rep 2019; 9:10931. [PMID: 31358877 PMCID: PMC6662744 DOI: 10.1038/s41598-019-47399-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 07/05/2019] [Indexed: 12/13/2022] Open
Abstract
MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data.
Collapse
|
16
|
Li F, Zhang Y, Purcell AW, Webb GI, Chou KC, Lithgow T, Li C, Song J. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics 2019; 20:112. [PMID: 30841845 PMCID: PMC6404354 DOI: 10.1186/s12859-019-2700-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 02/22/2019] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND As an important type of post-translational modification (PTM), protein glycosylation plays a crucial role in protein stability and protein function. The abundance and ubiquity of protein glycosylation across three domains of life involving Eukarya, Bacteria and Archaea demonstrate its roles in regulating a variety of signalling and metabolic pathways. Mutations on and in the proximity of glycosylation sites are highly associated with human diseases. Accordingly, accurate prediction of glycosylation can complement laboratory-based methods and greatly benefit experimental efforts for characterization and understanding of functional roles of glycosylation. For this purpose, a number of supervised-learning approaches have been proposed to identify glycosylation sites, demonstrating a promising predictive performance. To train a conventional supervised-learning model, both reliable positive and negative samples are required. However, in practice, a large portion of negative samples (i.e. non-glycosylation sites) are mislabelled due to the limitation of current experimental technologies. Moreover, supervised algorithms often fail to take advantage of large volumes of unlabelled data, which can aid in model learning in conjunction with positive samples (i.e. experimentally verified glycosylation sites). RESULTS In this study, we propose a positive unlabelled (PU) learning-based method, PA2DE (V2.0), based on the AlphaMax algorithm for protein glycosylation site prediction. The predictive performance of this proposed method was evaluated by a range of glycosylation data collected over a ten-year period based on an interval of three years. Experiments using both benchmarking and independent tests show that our method outperformed the representative supervised-learning algorithms (including support vector machines and random forests) and one-class learners, as well as currently available prediction methods in terms of F1 score, accuracy and AUC measures. In addition, we developed an online web server as an implementation of the optimized model (available at http://glycomine.erc.monash.edu/Lab/GlycoMine_PU/ ) to facilitate community-wide efforts for accurate prediction of protein glycosylation sites. CONCLUSION The proposed PU learning approach achieved a competitive predictive performance compared with currently available methods. This PU learning schema may also be effectively employed and applied to address the prediction problems of other important types of protein PTM site and functional sites.
Collapse
Affiliation(s)
- Fuyi Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| | - Yang Zhang
- College of Information Engineering, Northwest A and F University, Yangling, 712100 Shaanxi China
| | - Anthony W. Purcell
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478 USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800 Australia
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|
17
|
Shao W, Liu M, Xu YY, Shen HB, Zhang D. An Organelle Correlation-Guided Feature Selection Approach for Classifying Multi-Label Subcellular Bio-Images. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:828-838. [PMID: 28278481 DOI: 10.1109/tcbb.2017.2677907] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Nowadays, with the advances in microscopic imaging, accurate classification of bioimage-based protein subcellular location pattern has attracted as much attention as ever. One of the basic challenging problems is how to select the useful feature components among thousands of potential features to describe the images. This is not an easy task especially considering there is a high ratio of multi-location proteins. Existing feature selection methods seldom take the correlation among different cellular compartments into consideration, and thus may miss some features that will be co-important for several subcellular locations. To deal with this problem, we make use of the important structural correlation among different cellular compartments and propose an organelle structural correlation regularized feature selection method CSF (Common-Sets of Features) in this paper. We formulate the multi-label classification problem by adopting a group-sparsity regularizer to select common subsets of relevant features from different cellular compartments. In addition, we also add a cell structural correlation regularized Laplacian term, which utilizes the prior biological structural information to capture the intrinsic dependency among different cellular compartments. The CSF provides a new feature selection strategy for multi-label bio-image subcellular pattern classifications, and the experimental results also show its superiority when comparing with several existing algorithms.
Collapse
|
18
|
Song Y, He L, Zhou F, Chen S, Ni D, Lei B, Wang T. Segmentation, Splitting, and Classification of Overlapping Bacteria in Microscope Images for Automatic Bacterial Vaginosis Diagnosis. IEEE J Biomed Health Inform 2016; 21:1095-1104. [PMID: 27479982 DOI: 10.1109/jbhi.2016.2594239] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Quantitative analysis of bacterial morphotypes in the microscope images plays a vital role in diagnosis of bacterial vaginosis (BV) based on the Nugent score criterion. However, there are two main challenges for this task: 1) It is quite difficult to identify the bacterial regions due to various appearance, faint boundaries, heterogeneous shapes, low contrast with the background, and small bacteria sizes with regards to the image. 2) There are numerous bacteria overlapping each other, which hinder us to conduct accurate analysis on individual bacterium. To overcome these challenges, we propose an automatic method in this paper to diagnose BV by quantitative analysis of bacterial morphotypes, which consists of a three-step approach, i.e., bacteria regions segmentation, overlapping bacteria splitting, and bacterial morphotypes classification. Specifically, we first segment the bacteria regions via saliency cut, which simultaneously evaluates the global contrast and spatial weighted coherence. And then Markov random field model is applied for high-quality unsupervised segmentation of small object. We then decompose overlapping bacteria clumps into markers, and associate a pixel with markers to identify evidence for eventual individual bacterium splitting. Next, we extract morphotype features from each bacterium to learn the descriptors and to characterize the types of bacteria using an Adaptive Boosting machine learning framework. Finally, BV diagnosis is implemented based on the Nugent score criterion. Experiments demonstrate that our proposed method achieves high accuracy and efficiency in computation for BV diagnosis.
Collapse
|
19
|
Xu YY, Yang F, Shen HB. Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction. Bioinformatics 2016; 32:2184-92. [DOI: 10.1093/bioinformatics/btw219] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Accepted: 04/18/2016] [Indexed: 01/08/2023] Open
|
20
|
Ahmed Z, Dandekar T. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format. F1000Res 2015; 4:1453. [PMID: 29721305 PMCID: PMC5897790 DOI: 10.12688/f1000research.7329.3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/26/2018] [Indexed: 01/12/2023] Open
Abstract
Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool ‘Mining Scientific Literature (MSL)’, which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system’s output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, Farmington, CT, 06032, USA.,Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06032, USA
| | - Thomas Dandekar
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, 97074, Germany
| |
Collapse
|
21
|
Shao W, Liu M, Zhang D. Human cell structure-driven model construction for predicting protein subcellular location from biological images. Bioinformatics 2015; 32:114-21. [PMID: 26363175 DOI: 10.1093/bioinformatics/btv521] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 08/31/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. RESULTS We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. AVAILABILITY AND IMPLEMENTATION The dataset and code can be downloaded from https://github.com/shaoweinuaa/. CONTACT dqzhang@nuaa.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Shao
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Mingxia Liu
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Daoqiang Zhang
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| |
Collapse
|