1
|
Paul SK, Islam MSU, Akter N, Zohra FT, Rashid SB, Ahmed MS, Rahman SM, Sarkar MAR. Genome-wide identification and characterization of FORMIN gene family in cotton (Gossypium hirsutum L.) and their expression profiles in response to multiple abiotic stress treatments. PLoS One 2025; 20:e0319176. [PMID: 40029892 PMCID: PMC11875364 DOI: 10.1371/journal.pone.0319176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 01/29/2025] [Indexed: 03/06/2025] Open
Abstract
FORMIN proteins distinguished by FH2 domain, are conserved throughout evolution and widely distributed in eukaryotic organisms. These proteins interact with various signaling molecules and cytoskeletal proteins, playing crucial roles in both biotic and abiotic stress responses. However, the functions of FORMINs in cotton (Gossypium hirsutum L.) remain uncovered. In this study, 46 FORMIN genes in G. hirsutum (referred to as GhFH) were systematically identified. The gene structures, conserved domains, and motifs of these GhFH genes were thoroughly explored. Phylogenetic and structural analysis classified these 46 GhFH genes into five distinct groups. In silico subcellular localization, prediction suggested that GhFH genes are distributed across various cellular compartments, including the nucleus, extracellular space, cytoplasm, mitochondria, cytoskeleton, plasma membrane, endoplasmic reticulum, and chloroplasts. Evolutionary and functional diversification analyses, based on on-synonymous (Ka) and synonymous (Ks) ratios and gene duplication events, indicated that GhFH genes have evolved under purifying selection. The analysis of cis-acting elements suggested that GhFH genes may be involved in plant growth, hormone regulation, light response, and stress response. Results from transcriptional factors TFs and gene ontology analysis indicate that FORMIN proteins regulate cell wall structure and cytoskeleton dynamics by reacting to hormone signals associated with environmental stress. Additionally, 45 putative ghr-miRNAs were identified from 32 families targeting 33 GhFH genes. Expression analysis revealed that GhFH1, GhFH10, GhFH20, GhFH24, and GhFH30 exhibited the highest levels of expression under red, blue, and white light conditions. Further, GhFH9, GhFH20, and GhFH30 displayed higher expression levels under heat stress, while GhFH20 and GhFH30 showed increased expression under salt stress compared to controls. The result suggests that GhFH20 and GhFH30 genes could play significant roles in the development of G. hirsutum under heat and salt stresses. Overall these findings enhance our understanding of the biological functions of the cotton FORMIN family, offering prospects for developing stress-resistant cotton varieties through manipulation of GhFH gene expression.
Collapse
Affiliation(s)
- Suronjeet Kumar Paul
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md Shohel Ul Islam
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Nasrin Akter
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Fatema Tuz Zohra
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Rajshahi, Rajshahi, Bangladesh
| | - Shuraya Beente Rashid
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md. Shakil Ahmed
- Department of Biochemistry and Molecular Biology, Faculty of Science, University of Rajshahi, Rajshahi, Bangladesh
| | - Shaikh Mizanur Rahman
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md. Abdur Rauf Sarkar
- Laboratory of Functional Genomics and Proteomics, Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore, Bangladesh
| |
Collapse
|
2
|
T. B, D. S. Early detection of abiotic stress in plants through SNARE proteins using hybrid feature fusion model. PeerJ Comput Sci 2024; 10:e2149. [PMID: 39145217 PMCID: PMC11323173 DOI: 10.7717/peerj-cs.2149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 05/31/2024] [Indexed: 08/16/2024]
Abstract
Agriculture is the main source of livelihood for most of the population across the globe. Plants are often considered life savers for humanity, having evolved complex adaptations to cope with adverse environmental conditions. Protecting agricultural produce from devastating conditions such as stress is essential for the sustainable development of the nation. Plants respond to various environmental stressors such as drought, salinity, heat, cold, etc. Abiotic stress can significantly impact crop yield and development posing a major threat to agriculture. SNARE proteins play a major role in pathological processes as they are vital proteins in the life sciences. These proteins act as key players in stress responses. Feature extraction is essential for visualizing the underlying structure of the SNARE proteins in analyzing the root cause of abiotic stress in plants. To address this issue, we developed a hybrid model to capture the hidden structures of the SNAREs. A feature fusion technique has been devised by combining the potential strengths of convolutional neural networks (CNN) with a high dimensional radial basis function (RBF) network. Additionally, we employ a bi-directional long short-term memory (Bi-LSTM) network to classify the presence of SNARE proteins. Our feature fusion model successfully identified abiotic stress in plants with an accuracy of 74.6%. When compared with various existing frameworks, our model demonstrates superior classification results.
Collapse
Affiliation(s)
- Bhargavi T.
- School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
| | - Sumathi D.
- School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
| |
Collapse
|
3
|
Huerta M, Franco-Serrano L, Amela I, Perez-Pons JA, Piñol J, Mozo-Villarías A, Querol E, Cedano J. Role of Moonlighting Proteins in Disease: Analyzing the Contribution of Canonical and Moonlighting Functions in Disease Progression. Cells 2023; 12:cells12020235. [PMID: 36672169 PMCID: PMC9857295 DOI: 10.3390/cells12020235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/27/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The term moonlighting proteins refers to those proteins that present alternative functions performed by a single polypeptide chain acquired throughout evolution (called canonical and moonlighting, respectively). Over 78% of moonlighting proteins are involved in human diseases, 48% are targeted by current drugs, and over 25% of them are involved in the virulence of pathogenic microorganisms. These facts encouraged us to study the link between the functions of moonlighting proteins and disease. We found a large number of moonlighting functions activated by pathological conditions that are highly involved in disease development and progression. The factors that activate some moonlighting functions take place only in pathological conditions, such as specific cellular translocations or changes in protein structure. Some moonlighting functions are involved in disease promotion while others are involved in curbing it. The disease-impairing moonlighting functions attempt to restore the homeostasis, or to reduce the damage linked to the imbalance caused by the disease. The disease-promoting moonlighting functions primarily involve the immune system, mesenchyme cross-talk, or excessive tissue proliferation. We often find moonlighting functions linked to the canonical function in a pathological context. Moonlighting functions are especially coordinated in inflammation and cancer. Wound healing and epithelial to mesenchymal transition are very representative. They involve multiple moonlighting proteins with a different role in each phase of the process, contributing to the current-phase phenotype or promoting a phase switch, mitigating the damage or intensifying the remodeling. All of this implies a new level of complexity in the study of pathology genesis, progression, and treatment. The specific protein function involved in a patient's progress or that is affected by a drug must be elucidated for the correct treatment of diseases.
Collapse
|
4
|
Varghese DM, Nussinov R, Ahmad S. Predictive modeling of moonlighting DNA-binding proteins. NAR Genom Bioinform 2022; 4:lqac091. [PMID: 36474806 PMCID: PMC9716651 DOI: 10.1093/nargab/lqac091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 10/25/2022] [Accepted: 11/11/2022] [Indexed: 09/10/2024] Open
Abstract
Moonlighting proteins are multifunctional, single-polypeptide chains capable of performing multiple autonomous functions. Most moonlighting proteins have been discovered through work unrelated to their multifunctionality. We believe that prediction of moonlighting proteins from first principles, that is, using sequence, predicted structure, evolutionary profiles, and global gene expression profiles, for only one functional class of proteins in a single organism at a time will significantly advance our understanding of multifunctional proteins. In this work, we investigated human moonlighting DNA-binding proteins (mDBPs) in terms of properties that distinguish them from other (non-moonlighting) proteins with the same DNA-binding protein (DBP) function. Following a careful and comprehensive analysis of discriminatory features, a machine learning model was developed to assess the predictability of mDBPs from other DBPs (oDBPs). We observed that mDBPs can be discriminated from oDBPs with high accuracy of 74% AUC of ROC using these first principles features. A number of novel predicted mDBPs were found to have literature support for their being moonlighting and others are proposed as candidates, for which the moonlighting function is currently unknown. We believe that this work will help in deciphering and annotating novel moonlighting DBPs and scale up other functions. The source codes and data sets used for this work are freely available at https://zenodo.org/record/7299265#.Y2pO3ctBxPY.
Collapse
Affiliation(s)
- Dana Mary Varghese
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | - Ruth Nussinov
- Computational Structural Biology Section, Cancer Innovation Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Israel
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| |
Collapse
|
5
|
Chen Y, Li S, Guo J. A method for identifying moonlighting proteins based on linear discriminant analysis and bagging-SVM. Front Genet 2022; 13:963349. [PMID: 36046247 PMCID: PMC9420859 DOI: 10.3389/fgene.2022.963349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Moonlighting proteins have at least two independent functions and are widely found in animals, plants and microorganisms. Moonlighting proteins play important roles in signal transduction, cell growth and movement, tumor inhibition, DNA synthesis and repair, and metabolism of biological macromolecules. Moonlighting proteins are difficult to find through biological experiments, so many researchers identify moonlighting proteins through bioinformatics methods, but their accuracies are relatively low. Therefore, we propose a new method. In this study, we select SVMProt-188D as the feature input, and apply a model combining linear discriminant analysis and basic classifiers in machine learning to study moonlighting proteins, and perform bagging ensemble on the best-performing support vector machine. They are identified accurately and efficiently. The model achieves an accuracy of 93.26% and an F-sorce of 0.946 on the MPFit dataset, which is better than the existing MEL-MP model. Meanwhile, it also achieves good results on the other two moonlighting protein datasets.
Collapse
|
6
|
SIN-3 functions through multi-protein interaction to regulate apoptosis, autophagy, and longevity in Caenorhabditis elegans. Sci Rep 2022; 12:10560. [PMID: 35732652 PMCID: PMC9217932 DOI: 10.1038/s41598-022-13864-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/09/2022] [Indexed: 11/08/2022] Open
Abstract
SIN3/HDAC is a multi-protein complex that acts as a regulatory unit and functions as a co-repressor/co-activator and a general transcription factor. SIN3 acts as a scaffold in the complex, binding directly to HDAC1/2 and other proteins and plays crucial roles in regulating apoptosis, differentiation, cell proliferation, development, and cell cycle. However, its exact mechanism of action remains elusive. Using the Caenorhabditis elegans (C. elegans) model, we can surpass the challenges posed by the functional redundancy of SIN3 isoforms. In this regard, we have previously demonstrated the role of SIN-3 in uncoupling autophagy and longevity in C. elegans. In order to understand the mechanism of action of SIN3 in these processes, we carried out a comparative analysis of the SIN3 protein interactome from model organisms of different phyla. We identified conserved, expanded, and contracted gene classes. The C. elegans SIN-3 interactome -revealed the presence of well-known proteins, such as DAF-16, SIR-2.1, SGK-1, and AKT-1/2, involved in autophagy, apoptosis, and longevity. Overall, our analyses propose potential mechanisms by which SIN3 participates in multiple biological processes and their conservation across species and identifies candidate genes for further experimental analysis.
Collapse
|
7
|
Singh RP, Saini N, Sharma G, Rahisuddin R, Patel M, Kaushik A, Kumaran S. Moonlighting Biochemistry of Cysteine Synthase: A Species-specific Global Regulator. J Mol Biol 2021; 433:167255. [PMID: 34547327 DOI: 10.1016/j.jmb.2021.167255] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/10/2021] [Accepted: 09/12/2021] [Indexed: 11/18/2022]
Abstract
Cysteine Synthase (CS), the enzyme that synthesizes cysteine, performs non-canonical regulatory roles by binding and modulating functions of disparate proteins. Beyond its role in catalysis and regulation in the cysteine biosynthesis pathway, it exerts its moonlighting effect by binding to few other proteins which possess a C-terminal "CS-binding motif", ending with a terminal ILE. Therefore, we hypothesized that CS might regulate many other disparate proteins with the "CS-binding motif". In this study, we developed an iterative sequence matching method for mapping moonlighting biochemistry of CS and validated our prediction by analytical and structural approaches. Using a minimal protein-peptide interaction system, we show that five previously unknown CS-binder proteins that participate in diverse metabolic processes interact with CS in a species-specific manner. Furthermore, results show that signatures of protein-protein interactions, including thermodynamic, competitive-inhibition, and structural features, highly match the known CS-Binder, serine acetyltransferase (SAT). Together, the results presented in this study allow us to map the extreme multifunctional space (EMS) of CS and reveal the biochemistry of moonlighting space, a subset of EMS. We believe that the integrated computational and experimental workflow developed here could be further modified and extended to study protein-specific moonlighting properties of multifunctional proteins.
Collapse
Affiliation(s)
- Ravi Pratap Singh
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India
| | - Neha Saini
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India
| | - Gaurav Sharma
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Electronic city, Bengaluru, Karnataka 560100, India
| | - R Rahisuddin
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India. https://twitter.com/RahisuddinAlig
| | - Madhuri Patel
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India
| | - Abhishek Kaushik
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India
| | - S Kumaran
- G. N. Ramachandran Protein Centre, Council of Scientific and Industrial Research (CSIR), Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India.
| |
Collapse
|
8
|
Pan J, Li LP, You ZH, Yu CQ, Ren ZH, Guan YJ. Prediction of Protein-Protein Interactions in Arabidopsis, Maize, and Rice by Combining Deep Neural Network With Discrete Hilbert Transform. Front Genet 2021; 12:745228. [PMID: 34616437 PMCID: PMC8488469 DOI: 10.3389/fgene.2021.745228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/18/2021] [Indexed: 11/21/2022] Open
Abstract
Protein-protein interactions (PPIs) in plants play an essential role in the regulation of biological processes. However, traditional experimental methods are expensive, time-consuming, and need sophisticated technical equipment. These drawbacks motivated the development of novel computational approaches to predict PPIs in plants. In this article, a new deep learning framework, which combined the discrete Hilbert transform (DHT) with deep neural networks (DNN), was presented to predict PPIs in plants. To be more specific, plant protein sequences were first transformed as a position-specific scoring matrix (PSSM). Then, DHT was employed to capture features from the PSSM. To improve the prediction accuracy, we used the singular value decomposition algorithm to decrease noise and reduce the dimensions of the feature descriptors. Finally, these feature vectors were fed into DNN for training and predicting. When performing our method on three plant PPI datasets Arabidopsis thaliana, maize, and rice, we achieved good predictive performance with average area under receiver operating characteristic curve values of 0.8369, 0.9466, and 0.9440, respectively. To fully verify the predictive ability of our method, we compared it with different feature descriptors and machine learning classifiers. Moreover, to further demonstrate the generality of our approach, we also test it on the yeast and human PPI dataset. Experimental results anticipated that our method is an efficient and promising computational model for predicting potential plant-protein interacted pairs.
Collapse
Affiliation(s)
- Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
| | | | | | | | | |
Collapse
|
9
|
Liu X, Shen Y, Zhang Y, Liu F, Ma Z, Yue Z, Yue Y. IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models. PeerJ 2021; 9:e11900. [PMID: 34434652 PMCID: PMC8351581 DOI: 10.7717/peerj.11900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 07/13/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary. METHODS This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model. RESULTS The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.
Collapse
Affiliation(s)
- Xinyi Liu
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Yueyue Shen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Fei Liu
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhiyu Ma
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| | - Yi Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
10
|
Shirafkan F, Gharaghani S, Rahimian K, Sajedi RH, Zahiri J. Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods. BMC Bioinformatics 2021; 22:261. [PMID: 34030624 PMCID: PMC8142502 DOI: 10.1186/s12859-021-04194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 12/18/2022] Open
Abstract
Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04194-5.
Collapse
Affiliation(s)
- Farshid Shirafkan
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - Karim Rahimian
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Reza Hasan Sajedi
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, La Jolla, CA, USA. .,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
11
|
Li Y, Zhao J, Liu Z, Wang C, Wei L, Han S, Du W. De novo Prediction of Moonlighting Proteins Using Multimodal Deep Ensemble Learning. Front Genet 2021; 12:630379. [PMID: 33828582 PMCID: PMC8019903 DOI: 10.3389/fgene.2021.630379] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/08/2021] [Indexed: 01/04/2023] Open
Abstract
Moonlighting proteins (MPs) are a special type of protein with multiple independent functions. MPs play vital roles in cellular regulation, diseases, and biological pathways. At present, very few MPs have been discovered by biological experiments. Due to the lack of data sample, computation-based methods to identify MPs are limited. Currently, there is no de-novo prediction method for MPs. Therefore, systematic research and identification of MPs are urgently required. In this paper, we propose a multimodal deep ensemble learning architecture, named MEL-MP, which is the first de novo computation model for predicting MPs. First, we extract four sequence-based features: primary protein sequence information, evolutionary information, physical and chemical properties, and secondary protein structure information. Second, we select specific classifiers for each kind of feature. Finally, we apply the stacked ensemble to integrate the output of each classifier. Through comprehensive model selection and cross-validation experiments, it is shown that specific classifiers for specific feature types can achieve superior performance. For validating the effectiveness of the fusion-based stacked ensemble, different feature fusion strategies including direct combination and a multimodal deep auto-encoder are used for comparative purposes. MEL-MP is shown to exhibit superior prediction performance (F-score = 0.891), surpassing the existing machine learning model, MPFit (F-score = 0.784). In addition, MEL-MP is leveraged to predict the potential MPs among all human proteins. Furthermore, the distribution of predicted MPs on different chromosomes, the evolution of MPs, the association of MPs with diseases, and the functional enrichment of MPs are also explored. Finally, for maximum convenience, a user-friendly web server is available at: http://ml.csbg-jlu.site/mel-mp/.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Jianing Zhao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Zhaoqian Liu
- Department of Biomedical Informatics, College of Medicine, Ohiostate University, Columbus, OH, United States
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, Ohiostate University, Columbus, OH, United States
| | - Lizheng Wei
- Department of Biomedical Informatics, College of Medicine, Ohiostate University, Columbus, OH, United States
| | - Siyu Han
- Department of Computer Science, Faculty of Engineering University of Bristol, Bristol, United Kingdom
| | - Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
12
|
Espinosa-Cantú A, Cruz-Bonilla E, Noda-Garcia L, DeLuna A. Multiple Forms of Multifunctional Proteins in Health and Disease. Front Cell Dev Biol 2020; 8:451. [PMID: 32587857 PMCID: PMC7297953 DOI: 10.3389/fcell.2020.00451] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 05/14/2020] [Indexed: 12/23/2022] Open
Abstract
Protein science has moved from a focus on individual molecules to an integrated perspective in which proteins emerge as dynamic players with multiple functions, rather than monofunctional specialists. Annotation of the full functional repertoire of proteins has impacted the fields of biochemistry and genetics, and will continue to influence basic and applied science questions - from the genotype-to-phenotype problem, to our understanding of human pathologies and drug design. In this review, we address the phenomena of pleiotropy, multidomain proteins, promiscuity, and protein moonlighting, providing examples of multitasking biomolecules that underlie specific mechanisms of human disease. In doing so, we place in context different types of multifunctional proteins, highlighting useful attributes for their systematic definition and classification in future research directions.
Collapse
Affiliation(s)
- Adriana Espinosa-Cantú
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| | - Erika Cruz-Bonilla
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| | - Lianet Noda-Garcia
- Department of Plant Pathology and Microbiology, Robert H. Smith Faculty of Agriculture, Food, and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Alexander DeLuna
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados, Guanajuato, Mexico
| |
Collapse
|
13
|
Long MJC, Zhao Y, Aye Y. Neighborhood watch: tools for defining locale-dependent subproteomes and their contextual signaling activities. RSC Chem Biol 2020; 1:42-55. [PMID: 34458747 PMCID: PMC8341840 DOI: 10.1039/d0cb00041h] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 05/16/2020] [Indexed: 12/21/2022] Open
Abstract
Transient associations between numerous organelles-e.g., the endoplasmic reticulum and the mitochondria-forge highly-coordinated, particular environments essential for cross-compartment information flow. Our perspective summarizes chemical-biology tools that have enabled identifying proteins present within these itinerant communities against the bulk proteome, even when a particular protein's presence is fleeting/substoichiometric. However, proteins resident at these ephemeral junctions also experience transitory changes to their interactomes, small-molecule signalomes, and, importantly, functions. Thus, a thorough census of sub-organellar communities necessitates functionally probing context-dependent signaling properties of individual protein-players. Our perspective accordingly further discusses how repurposing of existing tools could allow us to glean a functional understanding of protein-specific signaling activities altered as a result of organelles pulling together. Collectively, our perspective strives to usher new chemical-biology techniques that could, in turn, open doors to modulate functions of specific subproteomes/organellar junctions underlying the nuanced regulatory subsystem broadly termed as contactology.
Collapse
Affiliation(s)
| | - Yi Zhao
- Swiss Federal Institute of Technology Lausanne (EPFL), Institute of Chemical Sciences and Engineering 1015 Lausanne Switzerland
| | - Yimon Aye
- Swiss Federal Institute of Technology Lausanne (EPFL), Institute of Chemical Sciences and Engineering 1015 Lausanne Switzerland
| |
Collapse
|
14
|
Zanzoni A, Ribeiro DM, Brun C. Understanding protein multifunctionality: from short linear motifs to cellular functions. Cell Mol Life Sci 2019; 76:4407-4412. [PMID: 31432235 PMCID: PMC11105236 DOI: 10.1007/s00018-019-03273-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 08/05/2019] [Accepted: 08/12/2019] [Indexed: 12/28/2022]
Abstract
Moonlighting proteins perform multiple unrelated functions without any change in polypeptide sequence. They can coordinate cellular activities, serving as switches between pathways and helping to respond to changes in the cellular environment. Therefore, regulation of the multiple protein activities, in space and time, is likely to be important for the homeostasis of biological systems. Some moonlighting proteins may perform their multiple functions simultaneously while others alternate between functions due to certain triggers. The switch of the moonlighting protein's functions can be regulated by several distinct factors, including the binding of other molecules such as proteins. We here review the approaches used to identify moonlighting proteins and existing repositories. We particularly emphasise the role played by short linear motifs and PTMs as regulatory switches of moonlighting functions.
Collapse
Affiliation(s)
- Andreas Zanzoni
- Aix Marseille Univ, INSERM, TAGC, UMR_S1090, Marseille, France
| | - Diogo M Ribeiro
- Aix Marseille Univ, INSERM, TAGC, UMR_S1090, Marseille, France
| | - Christine Brun
- Aix Marseille Univ, INSERM, TAGC, UMR_S1090, Marseille, France.
- CNRS, Marseille, France.
| |
Collapse
|
15
|
Abstract
Multifunctional genes are important genes because of their essential roles in human cells. Studying and analyzing multifunctional genes can help understand disease mechanisms and drug discovery. We propose a computational method for scoring gene multifunctionality based on functional annotations of the target gene from the Gene Ontology. The method is based on identifying pairs of GO annotations that represent semantically different biological functions and any gene annotated with two annotations from one pair is considered multifunctional. The proposed method can be employed to identify multifunctional genes in the entire human genome using solely the GO annotations. We evaluated the proposed method in scoring multifunctionality of all human genes using four criteria: gene-disease associations; protein-protein interactions; gene studies with PubMed publications; and published known multifunctional gene sets. The evaluation results confirm the validity and reliability of the proposed method for identifying multifunctional human genes. The results across all four evaluation criteria were statistically significant in determining multifunctionality. For example, the method confirmed that multifunctional genes tend to be associated with diseases more than other genes, with significance [Formula: see text]. Moreover, consistent with all previous studies, proteins encoded by multifunctional genes, based on our method, are involved in protein-protein interactions significantly more ([Formula: see text]) than other proteins.
Collapse
Affiliation(s)
- Hisham Al-Mubaid
- 1 Computer Science Department, University of Houston-Clear Lake, Houston, TX 77062, USA
| |
Collapse
|
16
|
Ding Z, Kihara D. Computational identification of protein-protein interactions in model plant proteomes. Sci Rep 2019; 9:8740. [PMID: 31217453 PMCID: PMC6584649 DOI: 10.1038/s41598-019-45072-8] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs of Arabidopsis thaliana and applied to three plants, Arabidopsis thaliana, Glycine max (soybean), and Zea mays (maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets of Arabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs in Arabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, 45229, USA.
| |
Collapse
|
17
|
Abstract
Proteomics studies that characterize hundreds or thousands of proteins in parallel can play an important part in the identification of moonlighting proteins, proteins that perform two or more distinct and physiologically relevant biochemical or biophysical functions. Functional assays, including ligand-binding assays, can find a surprising second function for a protein that was previously identified as performing a different function, for example, a DNA-binding ability for an enzyme in amino acid metabolism. The results of large-scale assays of protein-protein interactions, gene knockouts, or subcellular protein localizations, or bioinformatics analysis of amino acid sequences and three-dimensional structures, can also be used to predict that a protein has additional functions, but in these cases it is important to use biochemical and biophysical methods to confirm the protein can perform each function.
Collapse
Affiliation(s)
- Constance Jeffery
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, USA.
| |
Collapse
|
18
|
Jain A, Gali H, Kihara D. Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques. Proteomics 2018; 18:e1800083. [PMID: 30260564 PMCID: PMC6404977 DOI: 10.1002/pmic.201800083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/13/2018] [Indexed: 12/31/2022]
Abstract
Moonlighting proteins is an emerging concept for considering protein functions, which indicate proteins with two or more independent and distinct functions. An increasing number of moonlighting proteins have been reported in the past years; however, a systematic study of the topic has been hindered because the secondary functions of proteins are usually found serendipitously by experiments. Toward systematic identification and study of moonlighting proteins, computational methods for identifying moonlighting proteins from several different information sources, database entries, literature, and large-scale omics data have been developed. In this study, an overview for finding moonlighting proteins is discussed. Then, the literature-mining method, DextMP, is applied to find new moonlighting proteins in three genomes, Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. Potential moonlighting proteins identified by DextMP are further examined by a two-step manual literature checking procedure, which finally yielded 13 new moonlighting proteins. Identified moonlighting proteins are categorized into two classes based on the clarity of the distinctness of two functions of the proteins. A few cases of the identified moonlighting proteins are described in detail. Further direction for improving the DextMP algorithm is also discussed.
Collapse
Affiliation(s)
- Aashish Jain
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Hareesh Gali
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, 45229, USA
| |
Collapse
|
19
|
Meng F, Kurgan L. High‐throughput prediction of disordered moonlighting regions in protein sequences. Proteins 2018; 86:1097-1110. [DOI: 10.1002/prot.25590] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 07/25/2018] [Accepted: 08/05/2018] [Indexed: 01/20/2023]
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering University of Alberta Edmonton Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering University of Alberta Edmonton Canada
- Department of Computer Science Virginia Commonwealth University Richmond VA
| |
Collapse
|
20
|
Basavanhally T, Fonseca R, Uversky VN. Born This Way: Using Intrinsic Disorder to Map the Connections between SLITRKs, TSHR, and Male Sexual Orientation. Proteomics 2018; 18:e1800307. [PMID: 30156382 DOI: 10.1002/pmic.201800307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/03/2018] [Indexed: 12/15/2022]
Abstract
Recently, genome-wide association study reveals a significant association between specific single nucleotide polymorphisms (SNPs) in men and their sexual orientation. These SNPs (rs9547443 and rs1035144) reside in the intergenic region between the SLITRK5 and SLITRK6 genes and in the intronic region of the TSHR gene and might affect functionality of SLITRK5, SLITRK6, and TSHR proteins that are engaged in tight control of key developmental processes, such as neurite outgrowth and modulation, cellular differentiation, and hormonal regulation. SLITRK5 and SLITRK6 are single-pass transmembrane proteins, whereas TSHR is a heptahelical G protein-coupled receptor (GPCR). Mutations in these proteins are associated with various diseases and are linked to phenotypes found at a higher rate in homosexual men. A bioinformatics analysis of SLITRK5, SLITRK6, and TSHR proteins is conducted to look at their structure, protein interaction networks, and propensity for intrinsic disorder. It is assumed that this information might improve understanding of the roles that SLITRK5, SLITRK6, and TSHR play within neuronal and thyroidal tissues and give insight into the phenotypes associated with male homosexuality.
Collapse
Affiliation(s)
- Tara Basavanhally
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
| | - Renée Fonseca
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Protein Research Group, Institute for Biological Instrumentation of the Russian Academy of Sciences, 142290, Pushchino, Moscow, Russia
| |
Collapse
|
21
|
Abstract
Motivation Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. Results DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis, and found that about 2.5–35% of the proteomes are potential MPs. Availability and Implementation Code available at http://kiharalab.org/DextMP.
Collapse
Affiliation(s)
- Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Mansurul Bhuiyan
- Department of Computer Science, Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biological Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
22
|
Protein Moonlighting Revealed by Noncatalytic Phenotypes of Yeast Enzymes. Genetics 2017; 208:419-431. [PMID: 29127264 DOI: 10.1534/genetics.117.300377] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
A single gene can partake in several biological processes, and therefore gene deletions can lead to different-sometimes unexpected-phenotypes. However, it is not always clear whether such pleiotropy reflects the loss of a unique molecular activity involved in different processes or the loss of a multifunctional protein. Here, using Saccharomyces cerevisiae metabolism as a model, we systematically test the null hypothesis that enzyme phenotypes depend on a single annotated molecular function, namely their catalysis. We screened a set of carefully selected genes by quantifying the contribution of catalysis to gene deletion phenotypes under different environmental conditions. While most phenotypes were explained by loss of catalysis, slow growth was readily rescued by a catalytically inactive protein in about one-third of the enzymes tested. Such noncatalytic phenotypes were frequent in the Alt1 and Bat2 transaminases and in the isoleucine/valine biosynthetic enzymes Ilv1 and Ilv2, suggesting novel "moonlighting" activities in these proteins. Furthermore, differential genetic interaction profiles of gene deletion and catalytic mutants indicated that ILV1 is functionally associated with regulatory processes, specifically to chromatin modification. Our systematic study shows that gene loss phenotypes and their genetic interactions are frequently not driven by the loss of an annotated catalytic function, underscoring the moonlighting nature of cellular metabolism.
Collapse
|
23
|
Johnson KL, Cassin AM, Lonsdale A, Bacic A, Doblin MS, Schultz CJ. Pipeline to Identify Hydroxyproline-Rich Glycoproteins. PLANT PHYSIOLOGY 2017; 174:886-903. [PMID: 28446635 PMCID: PMC5462032 DOI: 10.1104/pp.17.00294] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 04/21/2017] [Indexed: 05/14/2023]
Abstract
Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyproline-rich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project (www.onekp.com) data set. Significant improvement in the detection of HRGPs using multiple-k-mer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.
Collapse
Affiliation(s)
- Kim L Johnson
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew M Cassin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew Lonsdale
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Antony Bacic
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Monika S Doblin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Carolyn J Schultz
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| |
Collapse
|
24
|
Wei Q, Khan IK, Ding Z, Yerneni S, Kihara D. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology. BMC Bioinformatics 2017; 18:177. [PMID: 28320317 PMCID: PMC5359872 DOI: 10.1186/s12859-017-1600-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 03/11/2017] [Indexed: 12/25/2022] Open
Abstract
Background The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. Results NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. Conclusions We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo.
Collapse
Affiliation(s)
- Qing Wei
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Satwica Yerneni
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA. .,Department of Biological Science, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
25
|
Schuelke M, Øien NC, Oldfors A. Myopathology in the times of modern genetics. Neuropathol Appl Neurobiol 2017; 43:44-61. [DOI: 10.1111/nan.12374] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 12/03/2016] [Accepted: 12/23/2016] [Indexed: 12/14/2022]
Affiliation(s)
- M. Schuelke
- Department of Neuropediatrics and NeuroCure Clinical Research Center; Charité-Universitätsmedizin; Berlin Germany
| | - N. C. Øien
- Department of Neuropediatrics and NeuroCure Clinical Research Center; Charité-Universitätsmedizin; Berlin Germany
- Max-Delbrück-Center for Molecular Medicine; Berlin Germany
| | - A. Oldfors
- Department of Pathology and Genetics; Institute of Biomedicine; University of Gothenburg; Gothenburg Sweden
| |
Collapse
|
26
|
Khan I, McGraw J, Kihara D. MPFit: Computational Tool for Predicting Moonlighting Proteins. Methods Mol Biol 2017; 1611:45-57. [PMID: 28451971 DOI: 10.1007/978-1-4939-7015-5_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
An increasing number of proteins have been found which are capable of performing two or more distinct functions. These proteins, known as moonlighting proteins, have drawn much attention recently as they may play critical roles in disease pathways and development. However, because moonlighting proteins are often found serendipitously, our understanding of moonlighting proteins is still quite limited. In order to lay the foundation for systematic moonlighting proteins studies, we developed MPFit, a software package for predicting moonlighting proteins from their omics features including protein-protein and gene interaction networks. Here, we describe and demonstrate the algorithm of MPFit, the idea behind it, and provide instruction for using the software.
Collapse
Affiliation(s)
- Ishita Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Joshua McGraw
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA. .,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|