1
|
Ke J, Zhao J, Li H, Yuan L, Dong G, Wang G. Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model. Comput Biol Med 2024; 174:108330. [PMID: 38588617 DOI: 10.1016/j.compbiomed.2024.108330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/06/2024] [Accepted: 03/17/2024] [Indexed: 04/10/2024]
Abstract
N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets.
Collapse
Affiliation(s)
- Jinsong Ke
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jianmei Zhao
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China; College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Hongfei Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China; College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, Quzhou, 324000, China
| | - Guanghui Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
2
|
Amdahl MB, Petersen EE, Bocian K, Kaliszuk SJ, DeMartino AW, Tiwari S, Sparacino-Watkins CE, Corti P, Rose JJ, Gladwin MT, Fago A, Tejero J. The Zebrafish Cytochrome b5/Cytochrome b5 Reductase/NADH System Efficiently Reduces Cytoglobins 1 and 2: Conserved Activity of Cytochrome b5/Cytochrome b5 Reductases during Vertebrate Evolution. Biochemistry 2019; 58:3212-3223. [PMID: 31257865 DOI: 10.1021/acs.biochem.9b00406] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Cytoglobin is a heme protein evolutionarily related to hemoglobin and myoglobin. Cytoglobin is expressed ubiquitously in mammalian tissues; however, its physiological functions are yet unclear. Phylogenetic analyses indicate that the cytoglobin gene is highly conserved in vertebrate clades, from fish to reptiles, amphibians, birds, and mammals. Most proposed roles for cytoglobin require the maintenance of a pool of reduced cytoglobin (FeII). We have shown previously that the human cytochrome b5/cytochrome b5 reductase system, considered a quintessential hemoglobin/myoglobin reductant, can reduce human and zebrafish cytoglobins ≤250-fold faster than human hemoglobin or myoglobin. It was unclear whether this reduction of zebrafish cytoglobins by mammalian proteins indicates a conserved pathway through vertebrate evolution. Here, we report the reduction of zebrafish cytoglobins 1 and 2 by the zebrafish cytochrome b5 reductase and the two zebrafish cytochrome b5 isoforms. In addition, the reducing system also supports reduction of Globin X, a conserved globin in fish and amphibians. Indeed, the zebrafish reducing system can maintain a fully reduced pool for both cytoglobins, and both cytochrome b5 isoforms can support this process. We determined the P50 for oxygen to be 0.5 Torr for cytoglobin 1 and 4.4 Torr for cytoglobin 2 at 25 °C. Thus, even at low oxygen tensions, the reduced cytoglobins may exist in a predominant oxygen-bound form. Under these conditions, the cytochrome b5/cytochrome b5 reductase system can support a conserved role for cytoglobins through evolution, providing electrons for redox signaling reactions such as nitric oxide dioxygenation, nitrite reduction, and phospholipid oxidation.
Collapse
Affiliation(s)
- Matthew B Amdahl
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Department of Bioengineering , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Elin E Petersen
- Department of Bioscience , Aarhus University , DK-8000 Aarhus C, Denmark
| | - Kaitlin Bocian
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Stefan J Kaliszuk
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Anthony W DeMartino
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Sagarika Tiwari
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Courtney E Sparacino-Watkins
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Paola Corti
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Jason J Rose
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Department of Bioengineering , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Division of Pulmonary, Allergy and Critical Care Medicine , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Mark T Gladwin
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Department of Bioengineering , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Division of Pulmonary, Allergy and Critical Care Medicine , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| | - Angela Fago
- Department of Bioscience , Aarhus University , DK-8000 Aarhus C, Denmark
| | - Jesús Tejero
- Heart, Lung, Blood, and Vascular Medicine Institute , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Department of Bioengineering , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Division of Pulmonary, Allergy and Critical Care Medicine , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.,Department of Pharmacology and Chemical Biology , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States
| |
Collapse
|
3
|
Habib MAH, Gan CY, Abdul Latiff A, Ismail MN. Unrestrictive identification of post-translational modifications in Hevea brasiliensis latex. Biochem Cell Biol 2018; 96:818-824. [PMID: 30058361 DOI: 10.1139/bcb-2018-0020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The natural rubber latex extracted from the bark of Hevea brasiliensis plays various important roles in modern society. Post-translational modifications (PTMs) of the latex proteins are important for the stability and functionality of the proteins. In this study, latex proteins were acquired from the C-serum, lutoids, and rubber particle layers of latex without using prior enrichment steps; they were fragmented using collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), and electron-transfer dissociation (ETD) activation methods. PEAKS 7 were used to search for unspecified PTMs, followed by analysis through PTM prediction tools to crosscheck both results. There were 73 peptides in 47 proteins from H. brasiliensis protein sequences derived from UniProtKB were identified and predicted to be post-translationally modified. The peptides with PTMs identified include phosphorylation, lysine acetylation, N-terminal acetylation, hydroxylation, and ubiquitination. Most of the PTMs discovered have yet to be reported in UniProt, which would provide great assistance in the research of the functional properties of H. brasiliensis latex proteins, as well as being useful biomarkers. The data are available via the MassIVE repository with identifier MSV000082419.
Collapse
Affiliation(s)
- Mohd Afiq Hazlami Habib
- Analytical Biochemistry Research Centre, Universiti Sains Malaysia, 11800 USM, Pulau, Pinang, Malaysia
| | - Chee-Yuen Gan
- Analytical Biochemistry Research Centre, Universiti Sains Malaysia, 11800 USM, Pulau, Pinang, Malaysia
| | | | - Mohd Nazri Ismail
- Analytical Biochemistry Research Centre, Universiti Sains Malaysia, 11800 USM, Pulau, Pinang, Malaysia
| |
Collapse
|
5
|
Shi S, Wang L, Cao M, Chen G, Yu J. Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications. Brief Bioinform 2018; 20:1597-1606. [DOI: 10.1093/bib/bby036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/07/2018] [Indexed: 12/18/2022] Open
Abstract
Abstract
Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine.
Collapse
Affiliation(s)
- Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Lina Wang
- Department of Science, Nanchang Institute of Technology, Nanchang, Jiangxi 330031, China
| | - Man Cao
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Guodong Chen
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| |
Collapse
|
6
|
Comprehensive analysis of human protein N-termini enables assessment of various protein forms. Sci Rep 2017; 7:6599. [PMID: 28747677 PMCID: PMC5529458 DOI: 10.1038/s41598-017-06314-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 06/09/2017] [Indexed: 01/10/2023] Open
Abstract
Various forms of protein (proteoforms) are generated by genetic variations, alternative splicing, alternative translation initiation, co- or post-translational modification and proteolysis. Different proteoforms are in part discovered by characterizing their N-terminal sequences. Here, we introduce an N-terminal-peptide-enrichment method, Nrich. Filter-aided negative selection formed the basis for the use of two N-blocking reagents and two endoproteases in this method. We identified 6,525 acetylated (or partially acetylated) and 6,570 free protein N-termini arising from 5,727 proteins in HEK293T human cells. The protein N-termini included translation initiation sites annotated in the UniProtKB database, putative alternative translational initiation sites, and N-terminal sites exposed after signal/transit/pro-peptide removal or unknown processing, revealing various proteoforms in cells. In addition, 46 novel protein N-termini were identified in 5′ untranslated region (UTR) sequence with pseudo start codons. Our data showing the observation of N-terminal sequences of mature proteins constitutes a useful resource that may provide information for a better understanding of various proteoforms in cells.
Collapse
|
7
|
Identification of the sequence determinants of protein N-terminal acetylation through a decision tree approach. BMC Bioinformatics 2017; 18:289. [PMID: 28578658 PMCID: PMC5457594 DOI: 10.1186/s12859-017-1699-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 05/18/2017] [Indexed: 11/29/2022] Open
Abstract
Background N-terminal acetylation is one of the most common protein modifications in eukaryotes and occurs co-translationally when the N-terminus of the nascent polypeptide is still attached to the ribosome. This modification has been shown to be involved in a wide range of biological phenomena such as protein half-life regulation, protein-protein and protein-membrane interactions, and protein subcellular localization. Thus, accurately predicting which proteins receive an acetyl group based on their protein sequence is expected to facilitate the functional study of this modification. As the occurrence of N-terminal acetylation strongly depends on the context of protein sequences, attempts to understand the sequence determinants of N-terminal acetylation were conducted initially by simply examining the N-terminal sequences of many acetylated and unacetylated proteins and more recently by machine learning approaches. However, a complete understanding of the sequence determinants of this modification remains to be elucidated. Results We obtained curated N-terminally acetylated and unacetylated sequences from the UniProt database and employed a decision tree algorithm to identify the sequence determinants of N-terminal acetylation for proteins whose initiator methionine (iMet) residues have been removed. The results suggested that the main determinants of N-terminal acetylation are contained within the first five residues following iMet and that the first and second positions are the most important discriminator for the occurrence of this phenomenon. The results also indicated the existence of position-specific preferred and inhibitory residues that determine the occurrence of N-terminal acetylation. The developed predictor software, termed NT-AcPredictor, accurately predicted the N-terminal acetylation, with an overall performance comparable or superior to those of preceding predictors incorporating machine learning algorithms. Conclusion Our machine learning approach based on a decision tree algorithm successfully provided several sequence determinants of N-terminal acetylation for proteins lacking iMet, some of which have not previously been described. Although these sequence determinants remain insufficient to comprehensively predict the occurrence of this modification, indicating that further work on this topic is still required, the developed predictor, NT-AcPredictor, can be used to predict N-terminal acetylation with an accuracy of more than 80%. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1699-4) contains supplementary material, which is available to authorized users.
Collapse
|