1
|
Wang Y, Sun H, Sheng N, He K, Hou W, Zhao Z, Yang Q, Huang L. ESMSec: Prediction of Secreted Proteins in Human Body Fluids Using Protein Language Models and Attention. Int J Mol Sci 2024; 25:6371. [PMID: 38928078 PMCID: PMC11204320 DOI: 10.3390/ijms25126371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/02/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024] Open
Abstract
The secreted proteins of human body fluid have the potential to be used as biomarkers for diseases. These biomarkers can be used for early diagnosis and risk prediction of diseases, so the study of secreted proteins of human body fluid has great application value. In recent years, the deep-learning-based transformer language model has transferred from the field of natural language processing (NLP) to the field of proteomics, leading to the development of protein language models (PLMs) for protein sequence representation. Here, we propose a deep learning framework called ESM Predict Secreted Proteins (ESMSec) to predict three types of proteins secreted in human body fluid. The ESMSec is based on the ESM2 model and attention architecture. Specifically, the protein sequence data are firstly put into the ESM2 model to extract the feature information from the last hidden layer, and all the input proteins are encoded into a fixed 1000 × 480 matrix. Secondly, multi-head attention with a fully connected neural network is employed as the classifier to perform binary classification according to whether they are secreted into each body fluid. Our experiment utilized three human body fluids that are important and ubiquitous markers. Experimental results show that ESMSec achieved average accuracy of 0.8486, 0.8358, and 0.8325 on the testing datasets for plasma, cerebrospinal fluid (CSF), and seminal fluid, which on average outperform the state-of-the-art (SOTA) methods. The outstanding performance results of ESMSec demonstrate that the ESM can improve the prediction performance of the model and has great potential to screen the secretion information of human body fluid proteins.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Huiting Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Nan Sheng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Kai He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48103, USA;
| | - Wenjv Hou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Ziqi Zhao
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Qixing Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (Y.W.); (H.S.); (N.S.); (W.H.); (Z.Z.); (Q.Y.)
| |
Collapse
|
2
|
He K, Wang Y, Xie X, Shao D. Prediction of Proteins in Cerebrospinal Fluid and Application to Glioma Biomarker Identification. Molecules 2023; 28:molecules28083617. [PMID: 37110850 PMCID: PMC10144833 DOI: 10.3390/molecules28083617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/18/2023] [Accepted: 04/19/2023] [Indexed: 04/29/2023] Open
Abstract
Cerebrospinal fluid (CSF) proteins are very important because they can serve as biomarkers for central nervous system diseases. Although many CSF proteins have been identified with wet experiments, the identification of CSF proteins is still a challenge. In this paper, we propose a novel method to predict proteins in CSF based on protein features. A two-stage feature-selection method is employed to remove irrelevant features and redundant features. The deep neural network and bagging method are used to construct the model for the prediction of CSF proteins. The experiment results on the independent testing dataset demonstrate that our method performs better than other methods in the prediction of CSF proteins. Furthermore, our method is also applied to the identification of glioma biomarkers. A differentially expressed gene analysis is performed on the glioma data. After combining the analysis results with the prediction results of our model, the biomarkers of glioma are identified successfully.
Collapse
Affiliation(s)
- Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Dan Shao
- College of Computer Science and Technology, Changchun University, Changchun 130022, China
| |
Collapse
|
3
|
MultiSec: Multi-Task Deep Learning Improves Secreted Protein Discovery in Human Body Fluids. MATHEMATICS 2022. [DOI: 10.3390/math10152562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Prediction of secreted proteins in human body fluids is essential since secreted proteins hold promise as disease biomarkers. Various approaches have been proposed to predict whether a protein is secreted into a specific fluid by its sequence. However, there may be relationships between different human body fluids when proteins are secreted into these fluids. Current approaches ignore these relationships directly, and therefore their performances are limited. Here, we present MultiSec, an improved approach for secreted protein discovery to exploit relationships between fluids via multi-task learning. Specifically, a sampling-based balance strategy is proposed to solve imbalance problems in all fluids, an effective network is presented to extract features for all fluids, and multi-objective gradient descent is employed to prevent fluids from hurting each other. MultiSec was trained and tested in 17 human body fluids. The comparison benchmarks on the independent testing datasets demonstrate that our approach outperforms other available approaches in all compared fluids.
Collapse
|
4
|
DenSec: Secreted Protein Prediction in Cerebrospinal Fluid Based on DenseNet and Transformer. MATHEMATICS 2022. [DOI: 10.3390/math10142490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Cerebrospinal fluid (CSF) exists in the surrounding spaces of mammalian central nervous systems (CNS); therefore, there are numerous potential protein biomarkers associated with CNS disease in CSF. Currently, approximately 4300 proteins have been identified in CSF by protein profiling. However, due to the diverse modifications, as well as the existing technical limits, large-scale protein identification in CSF is still considered a challenge. Inspired by computational methods, this paper proposes a deep learning framework, named DenSec, for secreted protein prediction in CSF. In the first phase of DenSec, all input proteins are encoded as a matrix with a fixed size of 1000 × 20 by calculating a position-specific score matrix (PSSM) of protein sequences. In the second phase, a dense convolutional network (DenseNet) is adopted to extract the feature from these PSSMs automatically. After that, Transformer with a fully connected dense layer acts as classifier to perform a binary classification in terms of secretion into CSF or not. According to the experiment results, DenSec achieves a mean accuracy of 86.00% in the test dataset and outperforms the state-of-the-art methods.
Collapse
|
5
|
Shao D, Huang L, Wang Y, He K, Cui X, Wang Y, Ma Q, Cui J. DeepSec: a deep learning framework for secreted protein discovery in human body fluids. Bioinformatics 2021; 38:228-235. [PMID: 34398224 PMCID: PMC8696095 DOI: 10.1093/bioinformatics/btab545] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/17/2021] [Accepted: 08/13/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Human proteins that are secreted into different body fluids from various cells and tissues can be promising disease indicators. Modern proteomics research empowered by both qualitative and quantitative profiling techniques has made great progress in protein discovery in various human fluids. However, due to the large number of proteins and diverse modifications present in the fluids, as well as the existing technical limits of major proteomics platforms (e.g. mass spectrometry), large discrepancies are often generated from different experimental studies. As a result, a comprehensive proteomics landscape across major human fluids are not well determined. RESULTS To bridge this gap, we have developed a deep learning framework, named DeepSec, to identify secreted proteins in 12 types of human body fluids. DeepSec adopts an end-to-end sequence-based approach, where a Convolutional Neural Network is built to learn the abstract sequence features followed by a Bidirectional Gated Recurrent Unit with fully connected layer for protein classification. DeepSec has demonstrated promising performances with average area under the ROC curves of 0.85-0.94 on testing datasets in each type of fluids, which outperforms existing state-of-the-art methods available mostly on blood proteins. As an illustration of how to apply DeepSec in biomarker discovery research, we conducted a case study on kidney cancer by using genomics data from the cancer genome atlas and have identified 104 possible marker proteins. AVAILABILITY DeepSec is available at https://bmbl.bmi.osumc.edu/deepsec/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Shao
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- College of Computer Science and Technology, Changchun University, Changchun 130022, China
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Kai He
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Xueteng Cui
- College of Computer Science and Technology, Changchun University, Changchun 130022, China
| | - Yao Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Juan Cui
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
6
|
Shao D, Huang L, Wang Y, Cui X, Li Y, Wang Y, Ma Q, Du W, Cui J. HBFP: a new repository for human body fluid proteome. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6395039. [PMID: 34642750 PMCID: PMC8516408 DOI: 10.1093/database/baab065] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 09/23/2021] [Accepted: 09/28/2021] [Indexed: 12/15/2022]
Abstract
Body fluid proteome has been intensively studied as a primary source for disease
biomarker discovery. Using advanced proteomics technologies, early research
success has resulted in increasingly accumulated proteins detected in different
body fluids, among which many are promising biomarkers. However, despite a
handful of small-scale and specific data resources, current research is clearly
lacking effort compiling published body fluid proteins into a centralized and
sustainable repository that can provide users with systematic analytic tools. In
this study, we developed a new database of human body fluid proteome (HBFP) that
focuses on experimentally validated proteome in 17 types of human body fluids.
The current database archives 11 827 unique proteins reported by 164
scientific publications, with a maximal false discovery rate of 0.01 on both the
peptide and protein levels since 2001, and enables users to query, analyze and
download protein entries with respect to each body fluid. Three unique features
of this new system include the following: (i) the protein annotation page
includes detailed abundance information based on relative qualitative measures
of peptides reported in the original references, (ii) a new score is calculated
on each reported protein to indicate the discovery confidence and (iii) HBFP
catalogs 7354 proteins with at least two non-nested uniquely mapping peptides of
nine amino acids according to the Human Proteome Project Data Interpretation
Guidelines, while the remaining 4473 proteins have more than two unique peptides
without given sequence information. As an important resource for human protein
secretome, we anticipate that this new HBFP database can be a powerful tool that
facilitates research in clinical proteomics and biomarker discovery. Database URL:https://bmbl.bmi.osumc.edu/HBFP/
Collapse
Affiliation(s)
- Dan Shao
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, 122E Avery Hall, 1144 T St., Lincoln, NE 68588, USA.,Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, China.,Department of Computer Science and Technology, Changchun University, 6543 Weixing Road, Changchun 130022, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, China
| | - Xueteng Cui
- Department of Computer Science and Technology, Changchun University, 6543 Weixing Road, Changchun 130022, China
| | - Yufei Li
- Department of Computer Science and Technology, Changchun University, 6543 Weixing Road, Changchun 130022, China
| | - Yao Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 310G Lincoln tower, 1800 cannon drive, Columbus, OH 43210, USA
| | - Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, China
| | - Juan Cui
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, 122E Avery Hall, 1144 T St., Lincoln, NE 68588, USA
| |
Collapse
|
7
|
SecProCT: In Silico Prediction of Human Secretory Proteins Based on Capsule Network and Transformer. Int J Mol Sci 2021; 22:ijms22169054. [PMID: 34445760 PMCID: PMC8396571 DOI: 10.3390/ijms22169054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/12/2021] [Accepted: 08/20/2021] [Indexed: 12/23/2022] Open
Abstract
Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.
Collapse
|
8
|
Janigro D, Bailey DM, Lehmann S, Badaut J, O'Flynn R, Hirtz C, Marchi N. Peripheral Blood and Salivary Biomarkers of Blood-Brain Barrier Permeability and Neuronal Damage: Clinical and Applied Concepts. Front Neurol 2021; 11:577312. [PMID: 33613412 PMCID: PMC7890078 DOI: 10.3389/fneur.2020.577312] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 12/01/2020] [Indexed: 12/12/2022] Open
Abstract
Within the neurovascular unit (NVU), the blood–brain barrier (BBB) operates as a key cerebrovascular interface, dynamically insulating the brain parenchyma from peripheral blood and compartments. Increased BBB permeability is clinically relevant for at least two reasons: it actively participates to the etiology of central nervous system (CNS) diseases, and it enables the diagnosis of neurological disorders based on the detection of CNS molecules in peripheral body fluids. In pathological conditions, a suite of glial, neuronal, and pericyte biomarkers can exit the brain reaching the peripheral blood and, after a process of filtration, may also appear in saliva or urine according to varying temporal trajectories. Here, we specifically examine the evidence in favor of or against the use of protein biomarkers of NVU damage and BBB permeability in traumatic head injury, including sport (sub)concussive impacts, seizure disorders, and neurodegenerative processes such as Alzheimer's disease. We further extend this analysis by focusing on the correlates of human extreme physiology applied to the NVU and its biomarkers. To this end, we report NVU changes after prolonged exercise, freediving, and gravitational stress, focusing on the presence of peripheral biomarkers in these conditions. The development of a biomarker toolkit will enable minimally invasive routines for the assessment of brain health in a broad spectrum of clinical, emergency, and sport settings.
Collapse
Affiliation(s)
- Damir Janigro
- Department of Physiology Case Western Reserve University, Cleveland, OH, United States.,FloTBI Inc., Cleveland, OH, United States
| | - Damian M Bailey
- Neurovascular Research Laboratory, Faculty of Life Sciences and Education, University of South Wales, Wales, United Kingdom
| | - Sylvain Lehmann
- IRMB, INM, UFR Odontology, University Montpellier, INSERM, CHU Montpellier, CNRS, Montpellier, France
| | - Jerome Badaut
- Brain Molecular Imaging Lab, CNRS UMR 5287, INCIA, University of Bordeaux, Bordeaux, France
| | - Robin O'Flynn
- IRMB, INM, UFR Odontology, University Montpellier, INSERM, CHU Montpellier, CNRS, Montpellier, France
| | - Christophe Hirtz
- IRMB, INM, UFR Odontology, University Montpellier, INSERM, CHU Montpellier, CNRS, Montpellier, France
| | - Nicola Marchi
- Cerebrovascular and Glia Research, Department of Neuroscience, Institute of Functional Genomics (UMR 5203 CNRS-U 1191 INSERM, University of Montpellier), Montpellier, France
| |
Collapse
|
9
|
Huang L, Shao D, Wang Y, Cui X, Li Y, Chen Q, Cui J. Human body-fluid proteome: quantitative profiling and computational prediction. Brief Bioinform 2021; 22:315-333. [PMID: 32020158 PMCID: PMC7820883 DOI: 10.1093/bib/bbz160] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 08/22/2019] [Accepted: 10/18/2019] [Indexed: 12/15/2022] Open
Abstract
Empowered by the advancement of high-throughput bio technologies, recent research on body-fluid proteomes has led to the discoveries of numerous novel disease biomarkers and therapeutic drugs. In the meantime, a tremendous progress in disclosing the body-fluid proteomes was made, resulting in a collection of over 15 000 different proteins detected in major human body fluids. However, common challenges remain with current proteomics technologies about how to effectively handle the large variety of protein modifications in those fluids. To this end, computational effort utilizing statistical and machine-learning approaches has shown early successes in identifying biomarker proteins in specific human diseases. In this article, we first summarized the experimental progresses using a combination of conventional and high-throughput technologies, along with the major discoveries, and focused on current research status of 16 types of body-fluid proteins. Next, the emerging computational work on protein prediction based on support vector machine, ranking algorithm, and protein-protein interaction network were also surveyed, followed by algorithm and application discussion. At last, we discuss additional critical concerns about these topics and close the review by providing future perspectives especially toward the realization of clinical disease biomarker discovery.
Collapse
Affiliation(s)
- Lan Huang
- College of Computer Science and Technology in the Jilin University
| | - Dan Shao
- College of Computer Science and Technology in the Jilin University
- College of Computer Science and Technology in Changchun University
| | - Yan Wang
- College of Computer Science and Technology in the Jilin University
| | - Xueteng Cui
- College of Computer Science and Technology in the Changchun University
| | - Yufei Li
- College of Computer Science and Technology in the Changchun University
| | - Qian Chen
- College of Computer Science and Technology in the Jilin University
| | - Juan Cui
- Department of Computer Science and Engineering in the University of Nebraska-Lincoln
| |
Collapse
|
10
|
Zhang J, Zhang Y, Li Y, Guo S, Yang G. Identification of Cancer Biomarkers in Human Body Fluids by Using Enhanced Physicochemical-incorporated Evolutionary Conservation Scheme. Curr Top Med Chem 2020; 20:1888-1897. [PMID: 32648847 DOI: 10.2174/1568026620666200710100743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/01/2020] [Accepted: 03/02/2020] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Cancer is one of the most serious diseases affecting human health. Among all current cancer treatments, early diagnosis and control significantly help increase the chances of cure. Detecting cancer biomarkers in body fluids now is attracting more attention within oncologists. In-silico predictions of body fluid-related proteins, which can be served as cancer biomarkers, open a door for labor-intensive and time-consuming biochemical experiments. METHODS In this work, we propose a novel method for high-throughput identification of cancer biomarkers in human body fluids. We incorporate physicochemical properties into the weighted observed percentages (WOP) and position-specific scoring matrices (PSSM) profiles to enhance their attributes that reflect the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator (LASSO) feature selection strategy is introduced to generate the optimal feature subset. RESULTS The ten-fold cross-validation results on training datasets demonstrate the accuracy of the proposed model. We also test our proposed method on independent testing datasets and apply it to the identification of potential cancer biomarkers in human body fluids. CONCLUSION The testing results promise a good generalization capability of our approach.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Yu Zhang
- Information Engineering College, Huanghuai University, Zhumadian, China
| | - Yanlin Li
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Song Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Guifu Yang
- College of Information Science and Technology, Northeast Normal University, Changchun, China
| |
Collapse
|
11
|
Du W, Sun Y, Li G, Cao H, Pang R, Li Y. CapsNet-SSP: multilane capsule network for predicting human saliva-secretory proteins. BMC Bioinformatics 2020; 21:237. [PMID: 32517646 PMCID: PMC7285745 DOI: 10.1186/s12859-020-03579-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 06/01/2020] [Indexed: 01/24/2023] Open
Abstract
Background Compared with disease biomarkers in blood and urine, biomarkers in saliva have distinct advantages in clinical tests, as they can be conveniently examined through noninvasive sample collection. Therefore, identifying human saliva-secretory proteins and further detecting protein biomarkers in saliva have significant value in clinical medicine. There are only a few methods for predicting saliva-secretory proteins based on conventional machine learning algorithms, and all are highly dependent on annotated protein features. Unlike conventional machine learning algorithms, deep learning algorithms can automatically learn feature representations from input data and thus hold promise for predicting saliva-secretory proteins. Results We present a novel end-to-end deep learning model based on multilane capsule network (CapsNet) with differently sized convolution kernels to identify saliva-secretory proteins only from sequence information. The proposed model CapsNet-SSP outperforms existing methods based on conventional machine learning algorithms. Furthermore, the model performs better than other state-of-the-art deep learning architectures mostly used to analyze biological sequences. In addition, we further validate the effectiveness of CapsNet-SSP by comparison with human saliva-secretory proteins from existing studies and known salivary protein biomarkers of cancer. Conclusions The main contributions of this study are as follows: (1) an end-to-end model based on CapsNet is proposed to identify saliva-secretory proteins from the sequence information; (2) the proposed model achieves better performance and outperforms existing models; and (3) the saliva-secretory proteins predicted by our model are statistically significant compared with existing cancer biomarkers in saliva. In addition, a web server of CapsNet-SSP is developed for saliva-secretory protein identification, and it can be accessed at the following URL: http://www.csbg-jlu.info/CapsNet-SSP/. We believe that our model and web server will be useful for biomedical researchers who are interested in finding salivary protein biomarkers, especially when they have identified candidate proteins for analyzing diseased tissues near or distal to salivary glands using transcriptome or proteomics.
Collapse
Affiliation(s)
- Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yu Sun
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Gaoyang Li
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Huansheng Cao
- Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA
| | - Ran Pang
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Ying Li
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
12
|
Zhang J, Zhang Y, Ma Z. In silico Prediction of Human Secretory Proteins in Plasma Based on Discrete Firefly Optimization and Application to Cancer Biomarkers Identification. Front Genet 2019; 10:542. [PMID: 31244885 PMCID: PMC6563772 DOI: 10.3389/fgene.2019.00542] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 05/21/2019] [Indexed: 12/20/2022] Open
Abstract
The early control and prevention of cancer contributes effectively interventions and cancer therapies. Secretory protein, one of the richest biomarkers, is proved important as molecular signposts of the physiological state of a cell. In this work, we aim to propose a proteomic high-throughput technology platform to facilitate detection of early cancer by means of biomarkers that secreted into the bloodstream. We compile a new benchmark dataset of human secretory proteins in plasma. A series of sequence-derived features, which have been proved involved in the structure and function of the secretory proteins, are collected to mathematically encode these proteins. Considering the influence of potential irrelevant or redundant features, we introduce discrete firefly optimization algorithm to perform feature selection. We evaluate and compare the proposed method SCRIP (Secretory proteins in plasma) with state-of-the-art approaches on benchmark datasets and independent testing datasets. SCRIP achieves the average AUC values of 0.876 and 0.844 in five-fold the cross-validation and independent test, respectively. Besides that, we also test SCRIP on proteins in four types of cancer tissues and successfully detect 66∼77% potential cancer biomarkers.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
- Henan Key Laboratory of Education Big Data Analysis and Application, Xinyang, China
| | - Yu Zhang
- Information Engineering College, Huanghuai University, Zhumadian, China
- Henan Key Laboratory of Smart Lighting, Zhumadian, China
| | - Zhiqiang Ma
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun, China
| |
Collapse
|
13
|
Zhou C, Li J, Li Q, Liu H, Ye D, Wu Z, Shen Z, Deng H. The clinical significance of HOXA9 promoter hypermethylation in head and neck squamous cell carcinoma. J Clin Lab Anal 2019; 33:e22873. [PMID: 30843252 PMCID: PMC6595302 DOI: 10.1002/jcla.22873] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 01/13/2019] [Accepted: 02/10/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The purpose of the current study was to assess the association between HOXA9 (homeobox A9) promoter methylation and head and neck squamous cell carcinoma (HNSCC) and its diagnostic value. METHODS Quantitative methylation-specific PCR (qMSP) was applied to measure HOXA9 promoter methylation levels in 145 paired HNSCC and corresponding normal tissue samples. Data from the Cancer Genome Atlas (TCGA) database (n = 578; 528 HNSCC and 50 normal) were also analyzed. RESULTS Significantly higher levels of HOXA9 promoter methylation were detected in HNSCC, compared with normal, tissues (our cohort: P = 1.06E-35; TCGA cohort: P = 3.06E-39). Moreover, HOXA9 methylation was significantly increased in patients with advanced tumor (T) stage, lymph node metastasis, and advanced clinical stage. Areas under the receiver characteristic curves (AUCs) based on our cohort and TCGA data were 0.930 and 0.967, respectively. CONCLUSION In summary, our study reveals that HOXA9 promoter hypermethylation contributes to the risk of HNSCC and its progression and metastasis. Additionally, HOXA9 hypermethylation is a potential biomarker for the early diagnosis and screening of patients with HNSCC.
Collapse
Affiliation(s)
- Chongchang Zhou
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China.,Laboratory of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China
| | - Jinyun Li
- Department of Oncology and Hematology, The Affiliated Hospital of Medical School of Ningbo University, Ningbo, China
| | - Qun Li
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China.,Laboratory of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China
| | - Huigao Liu
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Zhenhai Longsai Hospital, Ningbo, China
| | - Dong Ye
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China.,Laboratory of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China
| | - Zhenhua Wu
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center, Lihuili Eastern Hospital, Ningbo, China
| | - Zhisen Shen
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China.,Laboratory of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China
| | - Hongxia Deng
- Department of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China.,Laboratory of Otorhinolaryngology Head and Neck Surgery, Ningbo Medical Center Lihuili Hospital, Ningbo, China
| |
Collapse
|
14
|
Zhang J, Chai H, Guo S, Guo H, Li Y. High-Throughput Identification of Mammalian Secreted Proteins Using Species-Specific Scheme and Application to Human Proteome. Molecules 2018; 23:molecules23061448. [PMID: 29903999 PMCID: PMC6099666 DOI: 10.3390/molecules23061448] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 02/02/2023] Open
Abstract
Secreted proteins are widely spread in living organisms and cells. Since secreted proteins are easy to be detected in body fluids, urine, and saliva in clinical diagnosis, they play important roles in biomarkers for disease diagnosis and vaccine production. In this study, we propose a novel predictor for accurate high-throughput identification of mammalian secreted proteins that is based on sequence-derived features. We combine the features of amino acid composition, sequence motifs, and physicochemical properties to encode collected proteins. Detailed feature analyses prove the effectiveness of the considered features. Based on the differences across various species of secreted proteins, we introduce the species-specific scheme, which is expected to further explore the intrinsic attributes of specific secreted proteins. Experiments on benchmark datasets prove the effectiveness of our proposed method. The test on independent testing dataset also promises a good generalization capability. When compared with the traditional universal model, we experimentally demonstrate that the species-specific scheme is capable of significantly improving the prediction performance. We use our method to make predictions on unreviewed human proteome, and find 272 potential secreted proteins with probabilities that are higher than 99%. A user-friendly web server, named iMSPs (identification of Mammalian Secreted Proteins), which implements our proposed method, is designed and is available for free for academic use at: http://www.inforstation.com/webservers/iMSP/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Haiting Chai
- College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK.
| | - Song Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Huaping Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Yanling Li
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| |
Collapse
|
15
|
Rodrigues LM, Magrini TD, Lima CF, Scholz J, da Silva Martinho H, Almeida JD. Effect of smoking cessation in saliva compounds by FTIR spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2017; 174:124-129. [PMID: 27888782 DOI: 10.1016/j.saa.2016.11.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/15/2016] [Accepted: 11/09/2016] [Indexed: 05/27/2023]
Abstract
INTRODUCTION Smoking is currently considered one of the biggest risk factors for the development of various diseases and early death. Fourier transform infrared (FTIR) spectroscopy is a valuable tool for analysis of biofluids such as saliva and is considered useful for diagnostic purposes. The aim of this study was to evaluate the effect of smoking cessation on saliva composition by FTIR spectroscopy. METHODS We analyzed the saliva of participants in two groups: a smoker group made up of 10 chronic smokers and a former smoker group made up of 10 individuals who had stopped smoking. Members of both groups had similar smoking history. RESULTS The results showed few differences in spectral intensity between the groups; however, spectral peaks were slightly increased in the group of smokers in the bands for DNA, indicating modification of its content or cell necrosis. They were also increased for the mannose-6-phosphatase molecule, which is expressed in prostate and breast carcinomas. In the former smoker group, the peak of thyociante was decreased and the band referring to collagen increased in intensity, which indicates a better tissue regeneration capacity. CONCLUSION Considering these results and the fact that tobacco intake was similar between the groups, it can be concluded that there was recovery of tissue regeneration capacity with smoking cessation during the study period, although the effects found in smokers persisted in the bodies of those who had given up smoking.
Collapse
Affiliation(s)
- Laís Morandini Rodrigues
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, UNESP - Univ Estadual Paulista, Brazil. Biologic & Materials Sciences, Division of Prosthodontics, University of Michigan, United States.
| | | | - Celina Faig Lima
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, UNESP - Univ Estadual Paulista, Brazil
| | - Jaqueline Scholz
- Smoking Cessation Program Department, Heart Institute, University of Sao Paulo Medical School, Brazil
| | | | - Janete Dias Almeida
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, UNESP - Univ Estadual Paulista, Brazil.
| |
Collapse
|
16
|
Kaddi CD, Coulter WH, Wang MD. Developing Robust Predictive Models for Head and Neck Cancer across Microarray and RNA-seq Data. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2015; 2015:393-402. [PMID: 29568818 PMCID: PMC5859557 DOI: 10.1145/2808719.2808760] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Increased understanding of the transcriptomic patterns underlying head and neck squamous cell carcinoma (HNSCC) can facilitate earlier diagnosis and better treatment outcomes. Integrating knowledge from multiple studies is necessary to identify fundamental, consistent gene expression signatures that distinguish HNSCC patient samples from disease-free samples, and particularly for detecting HNSCC at an early pathological stage. This study utilizes feature integration and heterogeneous ensemble modeling techniques to develop robust models for predicting HNSCC disease status in both microarray and RNAseq datasets. Several alternative models demonstrated good performance, with MCC and AUC values exceeding 0.8. These models were also applied to discriminate between early pathological stage HNSCC and normal RNA-seq samples, showing encouraging results. The predictive modeling workflow was integrated into a software tool with a graphical user interface. This tool enables HNSCC researchers to harness frequently observed transcriptomic features and ensembles of previously developed models when investigating new HNSCC gene expression datasets.
Collapse
Affiliation(s)
- Chanchala D Kaddi
- Department of Biomedical Engineering Georgia Institute of Technology Atlanta, GA 1-404-385-5059
| | - Wallace H Coulter
- Department of Biomedical Engineering Georgia Institute of Technology Atlanta, GA 1-404-385-5059
| | - May D Wang
- Department of Biomedical Engineering Georgia Institute of Technology Atlanta, GA 1-404-385-5059
| |
Collapse
|