1
|
Imandi SB, Karanam SK, Nagumantri R, Srivastava RK, Sarangi PK. Neural networks and genetic algorithm as robust optimization tools for modeling the microbial production of poly‐β‐hydroxybutyrate (PHB) from Brewers’ spent grain. Biotechnol Appl Biochem 2022. [DOI: 10.1002/bab.2412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 10/23/2022] [Indexed: 11/09/2022]
Affiliation(s)
- Sarat Babu Imandi
- Department of Biotechnology, GITAM School of Technology, Gandhi Institute of Technology and Management (GITAM) Deemed to be University Gandhinagar, Rushikonda Visakhapatnam 530045 India
| | | | - Radhakrishna Nagumantri
- Department of Biotechnology, GITAM School of Technology, Gandhi Institute of Technology and Management (GITAM) Deemed to be University Gandhinagar, Rushikonda Visakhapatnam 530045 India
| | - Rajesh K. Srivastava
- Department of Biotechnology, GITAM School of Technology, Gandhi Institute of Technology and Management (GITAM) Deemed to be University Gandhinagar, Rushikonda Visakhapatnam 530045 India
| | | |
Collapse
|
2
|
SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med 2022; 146:105704. [PMID: 35690478 DOI: 10.1016/j.compbiomed.2022.105704] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/15/2022] [Accepted: 06/04/2022] [Indexed: 11/22/2022]
Abstract
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.
Collapse
|
3
|
Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022; 21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
4
|
Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci Rep 2021; 11:23782. [PMID: 34893688 PMCID: PMC8664844 DOI: 10.1038/s41598-021-03293-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 12/01/2021] [Indexed: 02/08/2023] Open
Abstract
Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- grid.7132.70000 0000 9039 7662Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200 Thailand
| | - Warot Chotpatiwetchkul
- grid.419784.70000 0001 0816 7508Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520 Thailand
| | - Vannajan Sanghiran Lee
- grid.10347.310000 0001 2308 5949Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Chanin Nantasenamat
- grid.10223.320000 0004 1937 0490Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700 Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
5
|
Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020; 8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open
Abstract
Thermophilicity is a very important property of proteins, as it sometimes determines denaturation and cell death. Thus, methods for predicting thermophilic proteins and non-thermophilic proteins are of interest and can contribute to the design and engineering of proteins. In this article, we describe the use of feature dimension reduction technology and LIBSVM to identify thermophilic proteins. The highest accuracy obtained by cross-validation was 96.02% with 119 parameters. When using only 16 features, we obtained an accuracy of 93.33%. We discuss the importance of the different characteristics in identification and report a comparison of the performance of support vector machine to that of other methods.
Collapse
Affiliation(s)
- Zifan Guo
- School of Aeronautics and Astronautic, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Yuming Zhao
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
6
|
Panja AS, Nag A, Bandopadhyay B, Maiti S. Protein Stability Determination (PSD): A Tool for Proteomics Analysis. Curr Bioinform 2018. [DOI: 10.2174/1574893613666180315121614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein Stability Determination (PSD) is a sequence-based bioinformatics tool which was developed by utilizing a large input of datasets of protein sequences in FASTA format. The PSD can be used to analyze the meta-proteomics data which will help to predict and design thermozyme and mesozyme for academic and industrial purposes. The PSD also can be utilized to analyze the protein sequence and to predict whether it will be stable in thermophilic or in the mesophilic environment. </P><P> Method and Results: This tool which is supported by any operating system is designed in Java and it provides a user-friendly graphical interface. It is a simple programme and can predict the thermostability nature of proteins with >90% accuracy. The PSD can also predict the nature of constituent amino acids i.e. acidic or basic and polar or nonpolar etc.Conclusion:PSD is highly capable to determine the thermostability status of a protein of hypothetical or unknown peptides as well as meta-proteomics data from any established database. The utilities of the PSD driven analyses include predictions on the functional assignment to a protein. The PSD also helps in designing peptides having flexible combinations of amino acids for functional stability. PSD is freely available at https://sourceforge.net/projects/protein-sequence-determination.
Collapse
Affiliation(s)
- Anindya Sundar Panja
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Akash Nag
- Department of Computer science, University of Burdwan, India
| | - Bidyut Bandopadhyay
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Smarajit Maiti
- Post Graduate Department of Biochemistry and Biotechnology, Cell and Molecular Therapeutics Laboratory, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| |
Collapse
|
7
|
Meng F, Xing G, Li Y, Song J, Wang Y, Meng Q, Lu J, Zhou Y, Liu Y, Wang D, Teng L. The optimization of Marasmius androsaceus submerged fermentation conditions in five-liter fermentor. Saudi J Biol Sci 2016; 23:S99-S105. [PMID: 26858573 PMCID: PMC4705249 DOI: 10.1016/j.sjbs.2015.06.022] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 06/20/2015] [Accepted: 06/22/2015] [Indexed: 11/22/2022] Open
Abstract
Using desirability function, four indexes including mycelium dry weight, intracellular polysaccharide, adenosine and mannitol yield were uniformed into one expected value (Da) which further served as the assessment criteria. In our present study, Plackett-Burman design was applied to evaluate the effects of eight variables including initial pH, rotating speed, culture temperature, inoculum size, ventilation volume, culture time, inoculum age and loading volume on Da value during Marasmius androsaceus submerged fermentation via a five-liter fermentor. Culture time, initial pH and rotating speed were found to influence Da value significantly and were further optimized by Box-Behnken design. Results obtained from Box-Behnken design were analyzed by both response surface regression (Design-Expert.V8.0.6.1 software) and artificial neural network combining the genetic algorithm method (Matlab2012a software). After comparison, the optimum M. androsaceus submerged fermentation conditions via a five-liter fermentor were obtained as follows: initial pH of 6.14, rotating speed of 289.3 rpm, culture time of 6.285 days, culture temperature of 26 °C, inoculum size of 5%, ventilation volume of 200 L/h, inoculum age of 4 days, and loading volume of 3.5 L/5 L. The predicted Da value of the optimum model was 0.4884 and the average experimental Da value was 0.4760. The model possesses well fitness and predictive ability.
Collapse
Affiliation(s)
- Fanxin Meng
- Zhuhai College, Jilin University, Zhuhai 519041, Guangdong Province, China
| | - Gaoyang Xing
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Yutong Li
- Norma Bethune Health Science Center, Jilin University, Changchun 130021, Jilin Province, China
| | - Jia Song
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Yanzhen Wang
- Zhuhai College, Jilin University, Zhuhai 519041, Guangdong Province, China
| | - Qingfan Meng
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Jiahui Lu
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Yulin Zhou
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Yan Liu
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Di Wang
- Zhuhai College, Jilin University, Zhuhai 519041, Guangdong Province, China
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| | - Lirong Teng
- Zhuhai College, Jilin University, Zhuhai 519041, Guangdong Province, China
- School of Life Sciences, Jilin University, Changchun 130012, Jilin Province, China
| |
Collapse
|