1
|
Qian J, Jin P, Yang Y, Ma N, Yang Z, Zhang X. Protein function annotation and virulence factor identification of Klebsiella pneumoniae genome by multiple machine learning models. Microb Pathog 2024; 193:106727. [PMID: 38851362 DOI: 10.1016/j.micpath.2024.106727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/10/2024]
Abstract
Klebsiella pneumoniae is a type of Gram-negative bacterium which can cause a range of infections in human. In recent years, an increasing number of strains of K. pneumoniae resistant to multiple antibiotics have emerged, posing a significant threat to public health. The protein function of this bacterium is not well known, thus a systematic investigation of K. pneumoniae proteome is in urgent need. In this study, the protein functions of this bacteria were re-annotated, and their function groups were analyzed. Moreover, three machine learning models were built to identify novel virulence factors. Results showed that the functions of 16 uncharacterized proteins were first annotated by sequence alignment. In addition, K. pneumoniae proteins share a high proportion of homology with Haemophilus influenzae and a low homology proportion with Chlamydia pneumoniae. By sequence analysis, 10 proteins were identified as potential drug targets for this bacterium. Our model achieved a high accuracy of 0.901 in the benchmark dataset. By applying our models to K. pneumoniae, we identified 39 virulence factors in this pathogen. Our findings could provide novel clues for the treatment of K. pneumoniae infection.
Collapse
Affiliation(s)
- Jinyang Qian
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Pengfei Jin
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Yueyue Yang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Nan Ma
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Zhiyuan Yang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China; School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China.
| | - Xiaoli Zhang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| |
Collapse
|
2
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
3
|
Zulfiqar H, Guo Z, Grace-Mercure BK, Zhang ZY, Gao H, Lin H, Wu Y. Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods. Comput Struct Biotechnol J 2023; 21:2253-2261. [PMID: 37035551 PMCID: PMC10073991 DOI: 10.1016/j.csbj.2023.03.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These proteins selectively and non-covalently interact with hormones and promote growth hormone signaling in human and other animals. The HBPs are useful in many medical and commercial fields. Thus, the identification of HBPs is very important because it can help to discover more details about hormone binding proteins. Meanwhile, the experimental methods are time-consuming and expensive for hormone binding proteins recognition. Computational prediction methods have played significant roles in the correct recognition of hormone binding proteins with the use of sequence information and ML algorithms. In this review, we compared and assessed the implementation of ML-based tools in recognition of HBPs in a unique way. We hope that this study will give enough awareness and knowledge for research on HBPs.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
| |
Collapse
|
4
|
Zheng Y, Song K, Xie ZX, Han MZ, Guo F, Yuan YJ. Machine learning-aided scoring of synthesis difficulties for designer chromosomes. SCIENCE CHINA. LIFE SCIENCES 2023:10.1007/s11427-023-2306-x. [PMID: 36881317 DOI: 10.1007/s11427-023-2306-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 02/23/2023] [Indexed: 03/08/2023]
Abstract
Designer chromosomes are artificially synthesized chromosomes. Nowadays, these chromosomes have numerous applications ranging from medical research to the development of biofuels. However, some chromosome fragments can interfere with the chemical synthesis of designer chromosomes and eventually limit the widespread use of this technology. To address this issue, this study aimed to develop an interpretable machine learning framework to predict and quantify the synthesis difficulties of designer chromosomes in advance. Through the use of this framework, six key sequence features leading to synthesis difficulties were identified, and an eXtreme Gradient Boosting model was established to integrate these features. The predictive model achieved high-quality performance with an AUC of 0.895 in cross-validation and an AUC of 0.885 on an independent test set. Based on these results, the synthesis difficulty index (S-index) was proposed as a means of scoring and interpreting synthesis difficulties of chromosomes from prokaryotes to eukaryotes. The findings of this study emphasize the significant variability in synthesis difficulties between chromosomes and demonstrate the potential of the proposed model to predict and mitigate these difficulties through the optimization of the synthesis process and genome rewriting.
Collapse
Affiliation(s)
- Yan Zheng
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.,School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Kai Song
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.,School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Ze-Xiong Xie
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.,School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Ming-Zhe Han
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.,School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
| | - Fei Guo
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China. .,School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| | - Ying-Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China. .,School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
5
|
Ali F, Kumar H, Patil S, Ahmad A, Babour A, Daud A. Deep-GHBP: Improving prediction of Growth Hormone-binding proteins using deep learning model. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103856] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|