1
|
Selvaraj MK, Thakur A, Kumar M, Pinnaka AK, Suri CR, Siddhardha B, Elumalai SP. Ion-pumping microbial rhodopsin protein classification by machine learning approach. BMC Bioinformatics 2023; 24:29. [PMID: 36707759 PMCID: PMC9881276 DOI: 10.1186/s12859-023-05138-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 01/04/2023] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Rhodopsin is a seven-transmembrane protein covalently linked with retinal chromophore that absorbs photons for energy conversion and intracellular signaling in eukaryotes, bacteria, and archaea. Haloarchaeal rhodopsins are Type-I microbial rhodopsin that elicits various light-driven functions like proton pumping, chloride pumping and Phototaxis behaviour. The industrial application of Ion-pumping Haloarchaeal rhodopsins is limited by the lack of full-length rhodopsin sequence-based classifications, which play an important role in Ion-pumping activity. The well-studied Haloarchaeal rhodopsin is a proton-pumping bacteriorhodopsin that shows promising applications in optogenetics, biosensitized solar cells, security ink, data storage, artificial retinal implant and biohydrogen generation. As a result, a low-cost computational approach is required to identify Ion-pumping Haloarchaeal rhodopsin sequences and its subtype. RESULTS This study uses a support vector machine (SVM) technique to identify these ion-pumping Haloarchaeal rhodopsin proteins. The haloarchaeal ion pumping rhodopsins viz., bacteriorhodopsin, halorhodopsin, xanthorhodopsin, sensoryrhodopsin and marine prokaryotic Ion-pumping rhodopsins like actinorhodopsin, proteorhodopsin have been utilized to develop the methods that accurately identified the ion pumping haloarchaeal and other type I microbial rhodopsins. We achieved overall maximum accuracy of 97.78%, 97.84% and 97.60%, respectively, for amino acid composition, dipeptide composition and hybrid approach on tenfold cross validation using SVM. Predictive models for each class of rhodopsin performed equally well on an independent data set. In addition to this, similar results were achieved using another machine learning technique namely random forest. Simultaneously predictive models performed equally well during five-fold cross validation. Apart from this study, we also tested the own, blank, BLAST dataset and annotated whole-genome rhodopsin sequences of PWS haloarchaeal isolates in the developed methods. The developed web server ( https://bioinfo.imtech.res.in/servers/rhodopred ) can identify the Ion Pumping Haloarchaeal rhodopsin proteins and their subtypes. We expect this web tool would be useful for rhodopsin researchers. CONCLUSION The overall performance of the developed method results show that it accurately identifies the Ionpumping Haloarchaeal rhodopsin and their subtypes using known and unknown microbial rhodopsin sequences. We expect that this study would be useful for optogenetics, molecular biologists and rhodopsin researchers.
Collapse
Affiliation(s)
- Muthu Krishnan Selvaraj
- grid.418099.dMTCC-Microbial Type Culture Collection and Gene Bank, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Anamika Thakur
- grid.418099.dVirology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Manoj Kumar
- grid.418099.dVirology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Anil Kumar Pinnaka
- grid.418099.dMTCC-Microbial Type Culture Collection and Gene Bank, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Chander Raman Suri
- grid.418099.dBiosensor Department, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Busi Siddhardha
- grid.412517.40000 0001 2152 9956Department of Microbiology, School of Life Sciences, Pondicherry University, Puducherry, 605014 India
| | - Senthil Prasad Elumalai
- grid.418099.dBiochemical Engineering Research and Process Development Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| |
Collapse
|
2
|
Selvaraj MK, Kaur J. Computational method for aromatase-related proteins using machine learning approach. PLoS One 2023; 18:e0283567. [PMID: 36989252 PMCID: PMC10057777 DOI: 10.1371/journal.pone.0283567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 03/12/2023] [Indexed: 03/30/2023] Open
Abstract
Human aromatase enzyme is a microsomal cytochrome P450 and catalyzes aromatization of androgens into estrogens during steroidogenesis. For breast cancer therapy, third-generation aromatase inhibitors (AIs) have proven to be effective; however patients acquire resistance to current AIs. Thus there is a need to predict aromatase-related proteins to develop efficacious AIs. A machine learning method was established to identify aromatase-related proteins using a five-fold cross validation technique. In this study, different SVM approach-based models were built using the following approaches like amino acid, dipeptide composition, hybrid and evolutionary profiles in the form of position-specific scoring matrix (PSSM); with maximum accuracy of 87.42%, 84.05%, 85.12%, and 92.02% respectively. Based on the primary sequence, the developed method is highly accurate to predict the aromatase-related proteins. Prediction scores graphs were developed using the known dataset to check the performance of the method. Based on the approach described above, a webserver for predicting aromatase-related proteins from primary sequence data was developed and implemented at https://bioinfo.imtech.res.in/servers/muthu/aromatase/home.html. We hope that the developed method will be useful for aromatase protein related research.
Collapse
Affiliation(s)
| | - Jasmeet Kaur
- Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| |
Collapse
|
4
|
Zhang Y, Yu S, Xie R, Li J, Leier A, Marquez-Lago TT, Akutsu T, Smith AI, Ge Z, Wang J, Lithgow T, Song J. PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins. Bioinformatics 2020; 36:704-712. [PMID: 31393553 DOI: 10.1093/bioinformatics/btz629] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 07/17/2019] [Accepted: 08/07/2019] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data. RESULTS In this work, we first constructed a high-quality dataset of experimentally verified 'non-classical' secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew's correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users' demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. AVAILABILITY AND IMPLEMENTATION http://pengaroo.erc.monash.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yanju Zhang
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Sha Yu
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia
| | - Ruopeng Xie
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia
| | - Jiahui Li
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - André Leier
- Department of Genetics, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - A Ian Smith
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|