Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Zhu G, Li K, Li F, Huang L, Duan M, Zhou F. HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction. Brief Bioinform 2022;23:6581432. [PMID: 35514183 PMCID: PMC9487590 DOI: 10.1093/bib/bbac173] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/29/2022] [Accepted: 04/18/2022] [Indexed: 12/11/2022] Open

For:	Zhang Y, Zhu G, Li K, Li F, Huang L, Duan M, Zhou F. HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction. Brief Bioinform 2022;23:6581432. [PMID: 35514183 PMCID: PMC9487590 DOI: 10.1093/bib/bbac173] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/29/2022] [Accepted: 04/18/2022] [Indexed: 12/11/2022] Open

Number

Cited by Other Article(s)

Li Y, Zou Q, Dai Q, Stalin A, Luo X. Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM. PLoS Comput Biol 2025;21:e1012513. [PMID: 40359430 PMCID: PMC12121914 DOI: 10.1371/journal.pcbi.1012513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 05/29/2025] [Accepted: 04/29/2025] [Indexed: 05/15/2025] Open

Luo J, Zhao K, Chen J, Yang C, Qu F, Liu Y, Jin X, Yan K, Zhang Y, Liu B. iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025;22:qzae084. [PMID: 39585308 PMCID: PMC12011362 DOI: 10.1093/gpbjnl/qzae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/25/2024] [Accepted: 11/21/2024] [Indexed: 11/26/2024]

Jiang R, Yue Z, Shang L, Wang D, Wei N. PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates. Metab Eng Commun 2024;19:e00248. [PMID: 39310048 PMCID: PMC11414552 DOI: 10.1016/j.mec.2024.e00248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 07/14/2024] [Accepted: 09/03/2024] [Indexed: 09/25/2024] Open

Sui J, Chen J, Chen Y, Iwamori N, Sun J. GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion. BMC Genomics 2024;25:1019. [PMID: 39478465 PMCID: PMC11526662 DOI: 10.1186/s12864-024-10954-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024] Open

Abstract

The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .

Collapse

Sheng D, Jin C, Yue K, Yue M, Liang Y, Xue X, Li P, Zhao G, Zhang L. Pan-cancer atlas of tumor-resident microbiome, immunity and prognosis. Cancer Lett 2024;598:217077. [PMID: 38908541 DOI: 10.1016/j.canlet.2024.217077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/23/2024] [Accepted: 06/14/2024] [Indexed: 06/24/2024]

Su Z, Wu Y, Cao K, Du J, Cao L, Wu Z, Wu X, Wang X, Song Y, Wang X, Duan H. APEX-pHLA: A novel method for accurate prediction of the binding between exogenous short peptides and HLA class I molecules. Methods 2024;228:38-47. [PMID: 38772499 DOI: 10.1016/j.ymeth.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/28/2024] [Accepted: 05/18/2024] [Indexed: 05/23/2024] Open

Machaca V, Goyzueta V, Cruz MG, Sejje E, Pilco LM, López J, Túpac Y. Transformers meets neoantigen detection: a systematic literature review. J Integr Bioinform 2024;21:jib-2023-0043. [PMID: 38960869 PMCID: PMC11377031 DOI: 10.1515/jib-2023-0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/20/2024] [Indexed: 07/05/2024] Open

Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024;15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open

Liu M, Wu T, Li X, Zhu Y, Chen S, Huang J, Zhou F, Liu H. ACPPfel: Explainable deep ensemble learning for anticancer peptides prediction based on feature optimization. Front Genet 2024;15:1352504. [PMID: 38487252 PMCID: PMC10937565 DOI: 10.3389/fgene.2024.1352504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/19/2024] [Indexed: 03/17/2024] Open

Affiliation(s)

Mingyou Liu School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
Tao Wu School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
Xue Li School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
Yingxue Zhu School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
Sen Chen School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
Jian Huang School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
Fengfeng Zhou School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
Hongmei Liu School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China

Collapse

Conev A, Fasoulis R, Hall-Swan S, Ferreira R, Kavraki LE. HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors. iScience 2024;27:108613. [PMID: 38188519 PMCID: PMC10770483 DOI: 10.1016/j.isci.2023.108613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 01/09/2024] Open

Zeng X, Meng FF, Li X, Zhong KY, Jiang B, Li Y. GHGPR-PPIS: A graph convolutional network for identifying protein-protein interaction site using heat kernel with Generalized PageRank techniques and edge self-attention feature processing block. Comput Biol Med 2024;168:107683. [PMID: 37984202 DOI: 10.1016/j.compbiomed.2023.107683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/10/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]

Zeng X, Zhong KY, Jiang B, Li Y. Fusing Sequence and Structural Knowledge by Heterogeneous Models to Accurately and Interpretively Predict Drug-Target Affinity. Molecules 2023;28:8005. [PMID: 38138496 PMCID: PMC10745601 DOI: 10.3390/molecules28248005] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/06/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open

Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023;23:e2300011. [PMID: 37381841 DOI: 10.1002/pmic.202300011] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]

Huang G, Tang X, Zheng P. DeepHLAPred: a deep learning-based method for non-classical HLA binder prediction. BMC Genomics 2023;24:706. [PMID: 37993812 PMCID: PMC10666343 DOI: 10.1186/s12864-023-09796-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 11/08/2023] [Indexed: 11/24/2023] Open

Wang GA, Yan X, Li X, Liu Y, Xia J, Zhu X. MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy. ACS OMEGA 2023;8:41930-41942. [PMID: 37969991 PMCID: PMC10634282 DOI: 10.1021/acsomega.3c07086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 11/17/2023]

Wang Y, Wang Z, Liu Y, Yu Q, Liu Y, Luo C, Wang S, Liu H, Liu M, Zhang G, Fan Y, Li K, Huang L, Duan M, Zhou F. Reconstructing the cytokine view for the multi-view prediction of COVID-19 mortality. BMC Infect Dis 2023;23:622. [PMID: 37735372 PMCID: PMC10514938 DOI: 10.1186/s12879-023-08291-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 04/28/2023] [Indexed: 09/23/2023] Open

Abstract

BACKGROUND

Coronavirus disease 2019 (COVID-19) is a rapidly developing and sometimes lethal pulmonary disease. Accurately predicting COVID-19 mortality will facilitate optimal patient treatment and medical resource deployment, but the clinical practice still needs to address it. Both complete blood counts and cytokine levels were observed to be modified by COVID-19 infection. This study aimed to use inexpensive and easily accessible complete blood counts to build an accurate COVID-19 mortality prediction model. The cytokine fluctuations reflect the inflammatory storm induced by COVID-19, but their levels are not as commonly accessible as complete blood counts. Therefore, this study explored the possibility of predicting cytokine levels based on complete blood counts.

METHODS

We used complete blood counts to predict cytokine levels. The predictive model includes an autoencoder, principal component analysis, and linear regression models. We used classifiers such as support vector machine and feature selection models such as adaptive boost to predict the mortality of COVID-19 patients.

RESULTS

Complete blood counts and original cytokine levels reached the COVID-19 mortality classification area under the curve (AUC) values of 0.9678 and 0.9111, respectively, and the cytokine levels predicted by the feature set alone reached the classification AUC value of 0.9844. The predicted cytokine levels were more significantly associated with COVID-19 mortality than the original values.

CONCLUSIONS

Integrating the predicted cytokine levels and complete blood counts improved a COVID-19 mortality prediction model using complete blood counts only. Both the cytokine level prediction models and the COVID-19 mortality prediction models are publicly available at http://www.healthinformaticslab.org/supp/resources.php .

Collapse

Affiliation(s)

Yueying Wang College of Computer Science and Technology, Jilin University, 130012, Changchun, China School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 130012, Changchun, China Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, 130021, Changchun, Jilin Province, China
Zhao Wang College of Software, Jilin University, 130012, Changchun, China
Yaqing Liu College of Computer Science and Technology, Jilin University, 130012, Changchun, China
Qiong Yu Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, 130021, Changchun, Jilin Province, China
Yujia Liu College of Software, Jilin University, 130012, Changchun, China
Changfan Luo College of Software, Jilin University, 130012, Changchun, China
Siyang Wang College of Software, Jilin University, 130012, Changchun, China
Hongmei Liu School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 130012, Changchun, China Engineering Research Center of Medical Biotechnology, Guizhou Medical University, 550025, Guiyang, Guizhou, China
Mingyou Liu School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China
Gongyou Zhang School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China
Yusi Fan College of Software, Jilin University, 130012, Changchun, China
Kewei Li College of Computer Science and Technology, Jilin University, 130012, Changchun, China School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China
Lan Huang College of Computer Science and Technology, Jilin University, 130012, Changchun, China School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China
Meiyu Duan College of Computer Science and Technology, Jilin University, 130012, Changchun, China. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 130012, Changchun, China.
Fengfeng Zhou College of Computer Science and Technology, Jilin University, 130012, Changchun, China. School of Biology and Engineering, Guizhou Medical University, 550025, Guiyang, Guizhou, China. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 130012, Changchun, China.

Collapse

Li F, Liu S, Li K, Zhang Y, Duan M, Yao Z, Zhu G, Guo Y, Wang Y, Huang L, Zhou F. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med 2023;160:107030. [PMID: 37196456 DOI: 10.1016/j.compbiomed.2023.107030] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]

Affiliation(s)

Fei Li Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Shuai Liu Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Kewei Li Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Yaqi Zhang Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Meiyu Duan Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
Zhaomin Yao College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
Gancheng Zhu Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Yutong Guo College of Life Sciences, Jilin University, Changchun, Jilin, 130012, China
Ying Wang Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Lan Huang Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
Fengfeng Zhou Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.

Collapse

Kalemati M, Darvishi S, Koohi S. CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks. Commun Biol 2023;6:492. [PMID: 37147498 PMCID: PMC10162658 DOI: 10.1038/s42003-023-04867-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open

IUP-BERT: Identification of Umami Peptides Based on BERT Features. Foods 2022;11:foods11223742. [PMID: 36429332 PMCID: PMC9689418 DOI: 10.3390/foods11223742] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022] Open

Dong B, Li M, Jiang B, Gao B, Li D, Zhang T. Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding. Front Genet 2022;13:1069558. [PMID: 36468005 PMCID: PMC9714691 DOI: 10.3389/fgene.2022.1069558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 11/02/2022] [Indexed: 09/10/2024] Open

Khare E, Gonzalez-Obeso C, Kaplan DL, Buehler MJ. CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach. ACS Biomater Sci Eng 2022;8:4301-4310. [PMID: 36149671 DOI: 10.1021/acsbiomaterials.2c00737] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature T_m. We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to-T_m pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to-T_m pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar R² values of test data (R² = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which T_m has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems.

Collapse