1
|
Suyash S, Jha A, Maitra P, Punia P, Mishra A. Differentiating stable and unstable protein using convolution neural network and molecular dynamics simulations. Comput Biol Chem 2024; 110:108081. [PMID: 38677012 DOI: 10.1016/j.compbiolchem.2024.108081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/17/2024] [Accepted: 04/17/2024] [Indexed: 04/29/2024]
Abstract
Protein stability is a critical aspect of molecular biology and biochemistry, hinges on an intricate balance of thermodynamic and structural factors. Determining protein stability is crucial for understanding and manipulating biological machineries, as it directly correlated with the protein function. Thus, this study delves into the intricacies of protein stability, highlighting its dependence on various factors, including thermodynamics, thermal conditions, and structural properties. Moreover, a notable focus is placed on the free energy change of unfolding (ΔGunfolding), change in heat capacity (ΔCp) with protein structural transition, melting temperature (Tm) and number of disulfide bonds, which are critical parameters in understanding protein stability. In this study, a machine learning (ML) predictive model was developed to estimate these four parameters using the primary sequence of the protein. The shortfall of available tools for protein stability prediction based on multiple parameters propelled the completion of this study. Convolutional Neural Network (CNN) with multiple layers was adopted to develop a more reliable ML model. Individual predictive models were prepared for each property, and all the prepared models showed results with high accuracy. The R2 (coefficient of determination) of these models were 0.79, 0.78, 0.92 and 0.92, respectively, for ΔG, ΔCp, Tm and disulfide bonds. A case study on stability analysis of two homologous proteins was presented to validate the results predicted through the developed model. The case study included in silico analysis of protein stability using molecular docking and molecular dynamic simulations. This validation study assured the accuracy of each model in predicting the stability associated properties. The alignment of physics-based principles with ML models has provided an opportunity to develop a fast machine learning solution to replace the computationally demanding physics-based calculations used to determine protein stability. Furthermore, this work provided valuable insights into the impact of mutation on protein stability, which has implications for the field of protein engineering. The source codes are available at https://github.com/Growdeatechnology.
Collapse
Affiliation(s)
| | - Akshat Jha
- Growdea Technologies Pvt. Ltd., Gurugram, Haryana 122004, India
| | - Priyasha Maitra
- Growdea Technologies Pvt. Ltd., Gurugram, Haryana 122004, India
| | - Parveen Punia
- Pt. Neki Ram Sharma Government College, Rohtak, Haryana 124001, India
| | - Avinash Mishra
- Growdea Technologies Pvt. Ltd., Gurugram, Haryana 122004, India.
| |
Collapse
|
2
|
Gooran N, Kopra K. Fluorescence-Based Protein Stability Monitoring-A Review. Int J Mol Sci 2024; 25:1764. [PMID: 38339045 PMCID: PMC10855643 DOI: 10.3390/ijms25031764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024] Open
Abstract
Proteins are large biomolecules with a specific structure that is composed of one or more long amino acid chains. Correct protein structures are directly linked to their correct function, and many environmental factors can have either positive or negative effects on this structure. Thus, there is a clear need for methods enabling the study of proteins, their correct folding, and components affecting protein stability. There is a significant number of label-free methods to study protein stability. In this review, we provide a general overview of these methods, but the main focus is on fluorescence-based low-instrument and -expertise-demand techniques. Different aspects related to thermal shift assays (TSAs), also called differential scanning fluorimetry (DSF) or ThermoFluor, are introduced and compared to isothermal chemical denaturation (ICD). Finally, we discuss the challenges and comparative aspects related to these methods, as well as future opportunities and assay development directions.
Collapse
Affiliation(s)
| | - Kari Kopra
- Department of Chemistry, University of Turku, Henrikinkatu 2, 20500 Turku, Finland;
| |
Collapse
|
3
|
Andrews T, Seravallic J, Powers R. The reversible low-temperature instability of human DJ-1 oxidative states. Biopolymers 2024; 115:e23534. [PMID: 36972340 PMCID: PMC10948107 DOI: 10.1002/bip.23534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 02/21/2023] [Accepted: 03/07/2023] [Indexed: 03/29/2023]
Abstract
DJ-1 is a homodimeric protein that is centrally involved in various human diseases including Parkinson disease (PD). DJ-1 protects against oxidative damage and mitochondrial dysfunction through a homeostatic control of reactive oxygen species (ROS). DJ-1 pathology results from a loss of function, where ROS readily oxidizes a highly conserved and functionally essential cysteine (C106). The over-oxidation of DJ-1 C106 leads to a dynamically destabilized and biologically inactivated protein. An analysis of the structural stability of DJ-1 as a function of oxidative state and temperature may provide further insights into the role the protein plays in PD progression. NMR spectroscopy, circular dichroism, analytical ultracentrifugation sedimentation equilibrium, and molecular dynamics simulations were utilized to investigate the structure and dynamics of the reduced, oxidized (C106-SO2 - ), and over-oxidized (C106-SO3 - ) forms of DJ-1 for temperatures ranging from 5°C to 37°C. The three oxidative states of DJ-1 exhibited distinct temperature-dependent structural changes. A cold-induced aggregation occurred for the three DJ-1 oxidative states by 5°C, where the over-oxidized state aggregated at significantly higher temperatures than both the oxidized and reduced forms. Only the oxidized and over-oxidized forms of DJ-1 exhibited a mix state containing both folded and partially denatured protein that likely preserved secondary structure content. The relative amount of this denatured form of DJ-1 increased as the temperature was lowered, consistent with a cold-denaturation. Notably, the cold-induced aggregation and denaturation for the DJ-1 oxidative states were completely reversible. The dramatic changes in the structural stability of DJ-1 as a function of oxidative state and temperature are relevant to its role in PD and its functional response to oxidative stress.
Collapse
Affiliation(s)
- Tessa Andrews
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Javier Seravallic
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0664, USA
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
- Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE 68588-0664,USA
- Nebraska Center for Integrated Biomolecular Communication, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| |
Collapse
|
4
|
Rollo C, Pancotti C, Birolo G, Rossi I, Sanavia T, Fariselli P. Influence of Model Structures on Predictors of Protein Stability Changes from Single-Point Mutations. Genes (Basel) 2023; 14:2228. [PMID: 38137050 PMCID: PMC10742815 DOI: 10.3390/genes14122228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.
Collapse
Affiliation(s)
- Cesare Rollo
- Department of Medical Sciences, University Torino, 10126 Torino, Italy (G.B.); (I.R.); (T.S.); (P.F.)
| | | | | | | | | | | |
Collapse
|
5
|
Li M, Wang H, Yang Z, Zhang L, Zhu Y. DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences. Comput Struct Biotechnol J 2023; 21:5544-5560. [PMID: 38034401 PMCID: PMC10681957 DOI: 10.1016/j.csbj.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/02/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (Tm). However, due to the limited availability of experimentally determined Tm data and the insufficient accuracy of existing computational methods in predicting Tm, there is an urgent need for a computational approach to accurately forecast the Tm values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the Tm values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R2) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the Tm values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering.
Collapse
Affiliation(s)
- Mengyu Li
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Hongzhao Wang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Zhenwu Yang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Longgui Zhang
- SINOPEC Beijing Research Institute of Chemical Industry, Beijing 100013, China
| | - Yushan Zhu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
6
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
7
|
Komp E, Alanzi HN, Francis R, Vuong C, Roberts L, Mosallanejad A, Beck DAC. Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe. Sci Data 2023; 10:682. [PMID: 37805601 PMCID: PMC10560248 DOI: 10.1038/s41597-023-02553-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/08/2023] [Indexed: 10/09/2023] Open
Abstract
Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, USA.
| | - Humood N Alanzi
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Ryan Francis
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Chau Vuong
- Department of Biochemistry, University of Washington, Seattle, USA
| | - Logan Roberts
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - Amin Mosallanejad
- Department of Chemical Engineering, University of Washington, Seattle, USA
| | - David A C Beck
- Department of Chemical Engineering, University of Washington, Seattle, USA.
- eScience Institute, University of Washington, Seattle, USA.
- Paul G. Allen School of Computer Science, University of Washington, Seattle, USA.
| |
Collapse
|
8
|
Jung F, Frey K, Zimmer D, Mühlhaus T. DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. Int J Mol Sci 2023; 24:ijms24087444. [PMID: 37108605 PMCID: PMC10138888 DOI: 10.3390/ijms24087444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 04/04/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
Proteins are essential macromolecules that carry out a plethora of biological functions. The thermal stability of proteins is an important property that affects their function and determines their suitability for various applications. However, current experimental approaches, primarily thermal proteome profiling, are expensive, labor-intensive, and have limited proteome and species coverage. To close the gap between available experimental data and sequence information, a novel protein thermal stability predictor called DeepSTABp has been developed. DeepSTABp uses a transformer-based protein language model for sequence embedding and state-of-the-art feature extraction in combination with other deep learning techniques for end-to-end protein melting temperature prediction. DeepSTABp can predict the thermal stability of a wide range of proteins, making it a powerful and efficient tool for large-scale prediction. The model captures the structural and biological properties that impact protein stability, and it allows for the identification of the structural features that contribute to protein stability. DeepSTABp is available to the public via a user-friendly web interface, making it accessible to researchers in various fields.
Collapse
Affiliation(s)
- Felix Jung
- Computational Systems Biology, RPTU University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Kevin Frey
- Computational Systems Biology, RPTU University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - David Zimmer
- Computational Systems Biology, RPTU University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Timo Mühlhaus
- Computational Systems Biology, RPTU University of Kaiserslautern, 67663 Kaiserslautern, Germany
| |
Collapse
|
9
|
Dhanalakshmi K, Kuramitsu S, Yokoyama S, Kumarevel T, Ponnuraj K. Crystal structure analysis of pyrrolidone carboxyl peptidase from Thermus thermophilus. Biophys Chem 2023; 293:106946. [PMID: 36563626 DOI: 10.1016/j.bpc.2022.106946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022]
Abstract
Pyrrolidone carboxyl peptidase (PCP) hydrolytically removes the L-pyroglutamic acid from the amino terminal region of pyroglutamyl proteins or peptides. So far, only a limited number of structures of PCP have been solved. Here we report the crystal structure of pyrrolidone carboxyl peptidase from Thermus thermophilus (TtPCP) which has been solved using the molecular replacement method and refined at 1.9 Å resolution. TtPCP follows the α/β/α architecture in which the central β-sheets are surrounded by α-helices on both sides. The inter subunit contact between two monomers consists of two short antiparallel β-strands and part of a long protrusion loop. By comparing the TtPCP with its structural homologs, we identified the putative catalytic triad residues as Glu76, Cys139 and His160. A unique disulfide link found in some homologs of TtPCP, formed between two monomers that provide thermal stability to the protein, is not observed in TtPCP. Hence, being a thermophilic protein, the putative thermal stability of TtPCP could be due to more intra and inter-molecular hydrogen bonds, hydrophobic and ion pair interactions when compared with its mesophilic counterpart. The structural details of TtPCP will be helpful to understand the basis of the intrinsic stability of thermophilic proteins. Also, it could be useful for protein engineering.
Collapse
Affiliation(s)
- K Dhanalakshmi
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600 025, India
| | - Seiki Kuramitsu
- Department of Biological Sciences, Graduate School of Science, Osaka University, Toyonaka, Osaka 560-0043, Japan
| | - Shigeyuki Yokoyama
- Structural Biology Laboratory, RIKEN Yokohama Institute, RIKEN, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan
| | - Thirumananseri Kumarevel
- Structural Biology Laboratory, RIKEN Yokohama Institute, RIKEN, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan; Laboratory for Transcription Structural Biology, RIKEN Center for Biosystems Dynamic Research, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan.
| | - Karthe Ponnuraj
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600 025, India.
| |
Collapse
|
10
|
Zhao J, Yan W, Yang Y. DeepTP: A Deep Learning Model for Thermophilic Protein Prediction. Int J Mol Sci 2023; 24:ijms24032217. [PMID: 36768540 PMCID: PMC9917291 DOI: 10.3390/ijms24032217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/19/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
Thermophilic proteins have important value in the fields of biopharmaceuticals and enzyme engineering. Most existing thermophilic protein prediction models are based on traditional machine learning algorithms and do not fully utilize protein sequence information. To solve this problem, a deep learning model based on self-attention and multiple-channel feature fusion was proposed to predict thermophilic proteins, called DeepTP. First, a large new dataset consisting of 20,842 proteins was constructed. Second, a convolutional neural network and bidirectional long short-term memory network were used to extract the hidden features in protein sequences. Different weights were then assigned to features through self-attention, and finally, biological features were integrated to build a prediction model. In a performance comparison with existing methods, DeepTP had better performance and scalability in an independent balanced test set and validation set, with AUC values of 0.944 and 0.801, respectively. In the unbalanced test set, DeepTP had an average precision (AP) of 0.536. The tool is freely available.
Collapse
Affiliation(s)
- Jianjun Zhao
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Wenying Yan
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215123, China
- Center for Systems Biology, Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Suzhou 215123, China
- Correspondence: (W.Y.); (Y.Y.)
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
- Correspondence: (W.Y.); (Y.Y.)
| |
Collapse
|