1
|
Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024; 64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
Collapse
Affiliation(s)
- Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and lnformation Displays & lnstitute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, PR China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou 225100, PR China
| | - Yumeng Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| |
Collapse
|
2
|
Song YC, Das D, Zhang Y, Chen MX, Fernie AR, Zhu FY, Han J. Proteogenomics-based functional genome research: approaches, applications, and perspectives in plants. Trends Biotechnol 2023; 41:1532-1548. [PMID: 37365082 DOI: 10.1016/j.tibtech.2023.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/17/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023]
Abstract
Proteogenomics (PG) integrates the proteome with the genome and transcriptome to refine gene models and annotation. Coupled with single-cell (SC) assays, PG effectively distinguishes heterogeneity among cell groups. Affiliating spatial information to PG reveals the high-resolution circuitry within SC atlases. Additionally, PG can investigate dynamic changes in protein-coding genes in plants across growth and development as well as stress and external stimulation, significantly contributing to the functional genome. Here we summarize existing PG research in plants and introduce the technical features of various methods. Combining PG with other omics, such as metabolomics and peptidomics, can offer even deeper insights into gene functions. We argue that the application of PG will represent an important font of foundational knowledge for plants.
Collapse
Affiliation(s)
- Yu-Chen Song
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Tree Genetics and Biotechnology of Educational Department of China, Key Laboratory of State Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Life Sciences, Nanjing Forestry University, Nanjing 210037, China; College of Biology and Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Debatosh Das
- College of Agriculture, Food and Natural Resources (CAFNR), Division of Plant Sciences and Technology, 52 Agricultural Building, University of Missouri-Columbia, MO 65201, USA
| | - Youjun Zhang
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany; Center of Plant Systems Biology and Biotechnology, Plovdiv, Bulgaria
| | - Mo-Xian Chen
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Tree Genetics and Biotechnology of Educational Department of China, Key Laboratory of State Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Life Sciences, Nanjing Forestry University, Nanjing 210037, China; College of Biology and Environment, Nanjing Forestry University, Nanjing 210037, China.
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany; Center of Plant Systems Biology and Biotechnology, Plovdiv, Bulgaria.
| | - Fu-Yuan Zhu
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Tree Genetics and Biotechnology of Educational Department of China, Key Laboratory of State Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Life Sciences, Nanjing Forestry University, Nanjing 210037, China; College of Biology and Environment, Nanjing Forestry University, Nanjing 210037, China.
| | - Jiangang Han
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Tree Genetics and Biotechnology of Educational Department of China, Key Laboratory of State Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Life Sciences, Nanjing Forestry University, Nanjing 210037, China; College of Biology and Environment, Nanjing Forestry University, Nanjing 210037, China.
| |
Collapse
|
3
|
Filippova TA, Masamrekh RA, Shumyantseva VV, Latsis IA, Farafonova TE, Ilina IY, Kanashenko SL, Moshkovskii SA, Kuzikov AV. Electrochemical biosensor for trypsin activity assay based on cleavage of immobilized tyrosine-containing peptide. Talanta 2023; 257:124341. [PMID: 36821964 DOI: 10.1016/j.talanta.2023.124341] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/13/2023] [Accepted: 02/09/2023] [Indexed: 02/16/2023]
Abstract
In this work, we proposed a biosensor for trypsin proteolytic activity assay using immobilization of model peptides on screen-printed electrodes (SPE) modified with gold nanoparticles (AuNPs) prepared by electrosynthetic method. Sensing of proteolytic activity was based on electrochemical oxidation of tyrosine residues of peptides. We designed peptides containing N-terminal cysteine residue for immobilization on an SPE, modified with gold nanoparticles, trypsin-specific cleavage site and tyrosine residue as a redox label. The peptides were immobilized on SPE by formation of chemical bonds between mercapto groups of the N-terminal cysteine residues and AuNPs. After the incubation with trypsin, time-dependent cleavage of the immobilized peptides was observed by decline in tyrosine electrochemical oxidation signal. The kinetic parameters of trypsin, such as the catalytic constant (kcat), the Michaelis constant (KM) and the catalytic efficiency (kcat/KM), toward the CGGGRYR peptide were determined as 0.33 ± 0.01 min-1, 198 ± 24 nM and 0.0016 min-1 nM-1, respectively. Using the developed biosensor, we demonstrated the possibility of analysis of trypsin specificity toward the peptides with amino acid residues disrupting proteolysis. Further, we designed the peptides with proline or glutamic acid residues after the cleavage site (CGGRPYR and CGGREYR), and trypsin had reduced activity toward both of them according to the existing knowledge of the enzyme specificity. The developed biosensor system allows one to perform a comparative analysis of the protease steady-state kinetic parameters and specificity toward model peptides with different amino acid sequences.
Collapse
Affiliation(s)
- Tatiana A Filippova
- Pirogov Russian National Research Medical University, 1 Ostrovityanova st., Moscow 117997, Russia; Institute of Biomedical Chemistry, 10, Pogodinskaya st., Moscow, 119121, Russia
| | - Rami A Masamrekh
- Pirogov Russian National Research Medical University, 1 Ostrovityanova st., Moscow 117997, Russia; Institute of Biomedical Chemistry, 10, Pogodinskaya st., Moscow, 119121, Russia
| | - Victoria V Shumyantseva
- Pirogov Russian National Research Medical University, 1 Ostrovityanova st., Moscow 117997, Russia; Institute of Biomedical Chemistry, 10, Pogodinskaya st., Moscow, 119121, Russia
| | - Ivan A Latsis
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a Malaya Pirogovskaya st., Moscow, 119435, Russia
| | | | - Irina Y Ilina
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a Malaya Pirogovskaya st., Moscow, 119435, Russia
| | - Sergey L Kanashenko
- Institute of Biomedical Chemistry, 10, Pogodinskaya st., Moscow, 119121, Russia
| | - Sergei A Moshkovskii
- Pirogov Russian National Research Medical University, 1 Ostrovityanova st., Moscow 117997, Russia; Federal Research and Clinical Center of Physical-Chemical Medicine, 1a Malaya Pirogovskaya st., Moscow, 119435, Russia.
| | - Alexey V Kuzikov
- Pirogov Russian National Research Medical University, 1 Ostrovityanova st., Moscow 117997, Russia; Institute of Biomedical Chemistry, 10, Pogodinskaya st., Moscow, 119121, Russia.
| |
Collapse
|
4
|
Vašíček J, Skiadopoulou D, Kuznetsova KG, Wen B, Johansson S, Njølstad PR, Bruckner S, Käll L, Vaudel M. Finding haplotypic signatures in proteins. Gigascience 2022; 12:giad093. [PMID: 37919975 PMCID: PMC10622322 DOI: 10.1093/gigascience/giad093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/24/2023] [Accepted: 10/08/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. FINDINGS Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. CONCLUSIONS As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
Collapse
Affiliation(s)
- Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Ksenia G Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Pål R Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen 5021, Norway
| | - Stefan Bruckner
- Chair of Visual Analytics, Institute for Visual and Analytic Computing, University of Rostock, Rostock 18051, Germany
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH–Royal Institute of Technology, Solna 17121, Sweden
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
- Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo 0473, Norway
| |
Collapse
|
5
|
Nasaev SS, Kopeykina AS, Kuznetsova KG, Levitsky LI, Moshkovskii SA. Proteomic Analysis of Zebrafish Protein Recoding via mRNA Editing by ADAR Enzymes. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:1301-1309. [PMID: 36509721 DOI: 10.1134/s0006297922110098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA editing by adenosine deaminases of the ADAR family can lead to protein recoding, since inosine formed from adenosine in mRNA is complementary to cytosine; the resulting codon editing might introduce amino acid substitutions into translated proteins. Proteome recoding can have functional consequences which have been described in many animals including humans. Using protein recoding database derived from publicly available transcriptome data, we identified for the first time the recoding sites in the zebrafish shotgun proteomes. Out of more than a hundred predicted recoding events, ten substitutions were found in six used datasets. Seven of them were in the AMPA glutamate receptor subunits, whose recoding has been well described, and are conserved among vertebrates. Three sites were specific for zebrafish proteins and were found in the transmembrane receptors astrotactin 1 and neuregulin 3b (proteins involved in the neuronal adhesion and signaling) and in the rims2b gene product (presynaptic membrane protein participating in the neurotransmitter release), respectively. Further studies are needed to elucidate the role of recoding of the said three proteins in the zebrafish.
Collapse
Affiliation(s)
- Shamsudin S Nasaev
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, 119435, Russia.,Institute of Biomedical Chemistry, Moscow, 119121, Russia
| | - Anna S Kopeykina
- Pirogov Russian National Research Medical University, Moscow, 117997, Russia
| | | | - Lev I Levitsky
- Talrose Institute for Energy Problems of Chemical Physics, Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Sergei A Moshkovskii
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, 119435, Russia. .,Pirogov Russian National Research Medical University, Moscow, 117997, Russia
| |
Collapse
|