1
|
Li M, Shi W, Zhang F, Zeng M, Li Y. A Deep Learning Framework for Predicting Protein Functions With Co-Occurrence of GO Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:833-842. [PMID: 35476573 DOI: 10.1109/tcbb.2022.3170719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The understanding of protein functions is critical to many biological problems such as the development of new drugs and new crops. To reduce the huge gap between the increase of protein sequences and annotations of protein functions, many methods have been proposed to deal with this problem. These methods use Gene Ontology (GO) to classify the functions of proteins and consider one GO term as a class label. However, they ignore the co-occurrence of GO terms that is helpful for protein function prediction. We propose a new deep learning model, named DeepPFP-CO, which uses Graph Convolutional Network (GCN) to explore and capture the co-occurrence of GO terms to improve the protein function prediction performance. In this way, we can further deduce the protein functions by fusing the predicted propensity of the center function and its co-occurrence functions. We use Fmax and AUPR to evaluate the performance of DeepPFP-CO and compare DeepPFP-CO with state-of-the-art methods such as DeepGOPlus and DeepGOA. The computational results show that DeepPFP-CO outperforms DeepGOPlus and other methods. Moreover, we further analyze our model at the protein level. The results have demonstrated that DeepPFP-CO improves the performance of protein function prediction. DeepPFP-CO is available at https://csuligroup.com/DeepPFP/.
Collapse
|
2
|
Li K, Quan L, Jiang Y, Li Y, Zhou Y, Wu T, Lyu Q. ctP 2ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:297-306. [PMID: 35213314 DOI: 10.1109/tcbb.2022.3154413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.
Collapse
|
3
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
4
|
Kaundal R, Loaiza CD, Duhan N, Flann N. deepHPI: a comprehensive deep learning platform for accurate prediction and visualization of host-pathogen protein-protein interactions. Brief Bioinform 2022; 23:6576450. [PMID: 35511057 DOI: 10.1093/bib/bbac125] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 02/07/2022] [Accepted: 03/15/2022] [Indexed: 01/06/2023] Open
Abstract
Host-pathogen protein interactions (HPPIs) play vital roles in many biological processes and are directly involved in infectious diseases. With the outbreak of more frequent pandemics in the last couple of decades, such as the recent outburst of Covid-19 causing millions of deaths, it has become more critical to develop advanced methods to accurately predict pathogen interactions with their respective hosts. During the last decade, experimental methods to identify HPIs have been used to decipher host-pathogen systems with the caveat that those techniques are labor-intensive, expensive and time-consuming. Alternatively, accurate prediction of HPIs can be performed by the use of data-driven machine learning. To provide a more robust and accurate solution for the HPI prediction problem, we have developed a deepHPI tool based on deep learning. The web server delivers four host-pathogen model types: plant-pathogen, human-bacteria, human-virus and animal-pathogen, leveraging its operability to a wide range of analyses and cases of use. The deepHPI web tool is the first to use convolutional neural network models for HPI prediction. These models have been selected based on a comprehensive evaluation of protein features and neural network architectures. The best prediction models have been tested on independent validation datasets, which achieved an overall Matthews correlation coefficient value of 0.87 for animal-pathogen using the combined pseudo-amino acid composition and conjoint triad (PAAC_CT) features, 0.75 for human-bacteria using the combined pseudo-amino acid composition, conjoint triad and normalized Moreau-Broto feature (PAAC_CT_NMBroto), 0.96 for human-virus using PAAC_CT_NMBroto and 0.94 values for plant-pathogen interactions using the combined pseudo-amino acid composition, composition and transition feature (PAAC_CTDC_CTDT). Our server running deepHPI is deployed on a high-performance computing cluster that enables large and multiple user requests, and it provides more information about interactions discovered. It presents an enriched visualization of the resulting host-pathogen networks that is augmented with external links to various protein annotation resources. We believe that the deepHPI web server will be very useful to researchers, particularly those working on infectious diseases. Additionally, many novel and known host-pathogen systems can be further investigated to significantly advance our understanding of complex disease-causing agents. The developed models are established on a web server, which is freely accessible at http://bioinfo.usu.edu/deepHPI/.
Collapse
Affiliation(s)
- Rakesh Kaundal
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences.,Department of Computer Science, College of Science; Utah State University, Logan, 84322 USA
| | - Cristian D Loaiza
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences
| | - Naveen Duhan
- Bioinformatics Facility, Center for Integrated BioSystems, College of Agriculture and Applied Sciences.,Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences
| | - Nicholas Flann
- Department of Computer Science, College of Science; Utah State University, Logan, 84322 USA
| |
Collapse
|
5
|
Andleeb S, Abbasi WA, Ghulam Mustafa R, Islam GU, Naseer A, Shafique I, Parween A, Shaheen B, Shafiq M, Altaf M, Ali Abbas S. ESIDE: A computationally intelligent method to identify earthworm species (E. fetida) from digital images: Application in taxonomy. PLoS One 2021; 16:e0255674. [PMID: 34529673 PMCID: PMC8445633 DOI: 10.1371/journal.pone.0255674] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/21/2021] [Indexed: 11/19/2022] Open
Abstract
Earthworms (Crassiclitellata) being ecosystem engineers significantly affect the physical, chemical, and biological properties of the soil by recycling organic material, increasing nutrient availability, and improving soil structure. The efficiency of earthworms in ecology varies along with species. Therefore, the role of taxonomy in earthworm study is significant. The taxonomy of earthworms cannot reliably be established through morphological characteristics because the small and simple body plan of the earthworm does not have anatomical complex and highly specialized structures. Recently, molecular techniques have been adopted to accurately classify the earthworm species but these techniques are time-consuming and costly. To combat this issue, in this study, we propose a machine learning-based earthworm species identification model that uses digital images of earthworms. We performed a stringent performance evaluation not only through 10-fold cross-validation and on an external validation dataset but also in real settings by involving an experienced taxonomist. In all the evaluation settings, our proposed model has given state-of-the-art performance and justified its use to aid earthworm taxonomy studies. We made this model openly accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/ESIDE.
Collapse
Affiliation(s)
- Saiqa Andleeb
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Wajid Arshad Abbasi
- Computaional Biology and Data Analysis Laboratory, Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Rozina Ghulam Mustafa
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Ghafoor ul Islam
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Anum Naseer
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Irsa Shafique
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Asma Parween
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Bushra Shaheen
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Muhamad Shafiq
- Environmental Protection Agency (AJK-EPA), Government of Azad Jammu and Kashmir, Muzaffarabad, AJ&K, Pakistan
| | - Muhammad Altaf
- Department of Forestry Range and Wildlife Management, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Syed Ali Abbas
- Computaional Biology and Data Analysis Laboratory, Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, Pakistan
| |
Collapse
|
6
|
Abbasi WA, Abbas SA, Andleeb S. PANDA: Predicting the change in proteins binding affinity upon mutations by finding a signal in primary structures. J Bioinform Comput Biol 2021; 19:2150015. [PMID: 34126874 DOI: 10.1142/s0219720021500153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately determining a change in protein binding affinity upon mutations is important to find novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be supported with computational methods. Most of the available computational prediction techniques depend upon protein structures that bound their applicability to only protein complexes with recognized 3D structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation and question the effectiveness of [Formula: see text]-fold cross-validation (CV) across mutations adopted in previous studies to assess the generalization ability of such predictors with no known mutation during training. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA performs comparably to the existing methods gauged through an appropriate CV scheme and an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods. We made PANDA easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/panda, respectively.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Syed Ali Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Saiqa Andleeb
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| |
Collapse
|
7
|
Abbasi WA, Abbas SA, Andleeb S, Ul Islam G, Ajaz SA, Arshad K, Khalil S, Anjam A, Ilyas K, Saleem M, Chughtai J, Abbas A. COVIDC: An expert system to diagnose COVID-19 and predict its severity using chest CT scans: Application in radiology. INFORMATICS IN MEDICINE UNLOCKED 2021; 23:100540. [PMID: 33644298 PMCID: PMC7901302 DOI: 10.1016/j.imu.2021.100540] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 02/17/2021] [Accepted: 02/19/2021] [Indexed: 01/09/2023] Open
Abstract
Early diagnosis of Coronavirus disease 2019 (COVID-19) is significantly important, especially in the absence or inadequate provision of a specific vaccine, to stop the surge of this lethal infection by advising quarantine. This diagnosis is challenging as most of the patients having COVID-19 infection stay asymptomatic while others showing symptoms are hard to distinguish from patients having different respiratory infections such as severe flu and Pneumonia. Due to cost and time-consuming wet-lab diagnostic tests for COVID-19, there is an utmost requirement for some alternate, non-invasive, rapid, and discounted automatic screening system. A chest CT scan can effectively be used as an alternative modality to detect and diagnose the COVID-19 infection. In this study, we present an automatic COVID-19 diagnostic and severity prediction system called COVIDC (COVID-19 detection using CT scans) that uses deep feature maps from the chest CT scans for this purpose. Our newly proposed system not only detects COVID-19 but also predicts its severity by using a two-phase classification approach (COVID vs non-COVID, and COVID-19 severity) with deep feature maps and different shallow supervised classification algorithms such as SVMs and random forest to handle data scarcity. We performed a stringent COVIDC performance evaluation not only through 10-fold cross-validation and an external validation dataset but also in a real setting under the supervision of an experienced radiologist. In all the evaluation settings, COVIDC outperformed all the existing state-of-the-art methods designed to detect COVID-19 with an F1 score of 0.94 on the validation dataset and justified its use to diagnose COVID-19 effectively in the real setting by classifying correctly 9 out of 10 COVID-19 CT scans. We made COVIDC openly accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidc.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Syed Ali Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Saiqa Andleeb
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Ghafoor Ul Islam
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Syeda Adin Ajaz
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Kinza Arshad
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Sadia Khalil
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Asma Anjam
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Kashif Ilyas
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Mohsib Saleem
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Jawad Chughtai
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| | - Ayesha Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Science & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K, 13100, Pakistan
| |
Collapse
|
8
|
COVIDX: Computer-aided diagnosis of COVID-19 and its severity prediction with raw digital chest X-ray scans. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Abbasi WA, Yaseen A, Hassan FU, Andleeb S, Minhas FUAA. ISLAND: in-silico proteins binding affinity prediction using sequence information. BioData Min 2020; 13:20. [PMID: 33292419 PMCID: PMC7688004 DOI: 10.1186/s13040-020-00231-w] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 11/15/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. METHOD We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. RESULTS We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software . CONCLUSION This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Laboratory, Department of Computer Science and Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, Pakistan. .,Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan.
| | - Adiba Yaseen
- Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan
| | - Fahad Ul Hassan
- Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan
| | - Saiqa Andleeb
- Biotechnology Laboratory, Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, Pakistan
| | | |
Collapse
|
10
|
Zhang H, Cheng W, Zheng J, Wang P, Liu Q, Li Z, Shi T, Zhou Y, Mao Y, Yu X. Identification and Molecular Characterization of a Pellino Protein in Kuruma Prawn ( Marsupenaeus Japonicus) in Response to White Spot Syndrome Virus and Vibrio Parahaemolyticus Infection. Int J Mol Sci 2020; 21:ijms21041243. [PMID: 32069894 PMCID: PMC7072872 DOI: 10.3390/ijms21041243] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 01/23/2020] [Accepted: 02/05/2020] [Indexed: 12/22/2022] Open
Abstract
Kuruma prawn, Marsupenaeus japonicus, has the third largest annual yield among shrimp species with vital economic significance in China. White spot syndrome virus (WSSV) is a great threat to the global shrimp farming industry and results in high mortality. Pellino, a highly conserved E3 ubiquitin ligase, has been found to be an important modulator of the Toll-like receptor (TLR) signaling pathways that participate in the innate immune response and ubiquitination. In the present study, the Pellino gene from Marsupenaeus japonicus was identified. A qRT-PCR assay showed the presence of MjPellino in all the tested tissues and revealed that the transcript level of this gene was significantly upregulated in both the gills and hemocytes after challenge with WSSV and Vibrio parahaemolyticus. The function of MjPellino was further verified at the protein level. The results of the three-dimensional modeling and protein-protein docking analyses and a GST pull-down assay revealed that the MjPellino protein was able to bind to the WSSV envelope protein VP26. In addition, the knockdown of MjPellino in vivo significantly decreased the expression of MjAMPs. These results suggest that MjPellino might play an important role in the immune response of kuruma prawn.
Collapse
Affiliation(s)
- Heqian Zhang
- Joint Laboratory of Guangdong Province and Hong Kong Regions on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou 510642, China; (H.Z.); (Q.L.); (Z.L.)
| | - Wenzhi Cheng
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
| | - Jinbin Zheng
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
| | - Panpan Wang
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
| | - Qinghui Liu
- Joint Laboratory of Guangdong Province and Hong Kong Regions on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou 510642, China; (H.Z.); (Q.L.); (Z.L.)
| | - Zhen Li
- Joint Laboratory of Guangdong Province and Hong Kong Regions on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou 510642, China; (H.Z.); (Q.L.); (Z.L.)
| | - Tianyi Shi
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
| | - Yijian Zhou
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
| | - Yong Mao
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; (W.C.); (J.Z.); (P.W.); (T.S.); (Y.Z.)
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, Xiamen University, Xiamen 361102, China
- Correspondence: (Y.M.); (X.Y.)
| | - Xiangyong Yu
- Joint Laboratory of Guangdong Province and Hong Kong Regions on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou 510642, China; (H.Z.); (Q.L.); (Z.L.)
- Correspondence: (Y.M.); (X.Y.)
| |
Collapse
|
11
|
Gull S, Shamim N, Minhas F. AMAP: Hierarchical multi-label prediction of biologically active and antimicrobial peptides. Comput Biol Med 2019; 107:172-181. [DOI: 10.1016/j.compbiomed.2019.02.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 02/17/2019] [Accepted: 02/20/2019] [Indexed: 12/12/2022]
|
12
|
Ivan FX, Kwoh CK, Chow VT, Zheng J. Genome Analysis – Identification of Genes Involved in Host-Pathogen Protein-Protein Interaction Networks. ENCYCLOPEDIA OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2019:410-424. [DOI: 10.1016/b978-0-12-809633-8.20124-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
13
|
Abbasi WA, Asif A, Ben-Hur A, Minhas FUAA. Learning protein binding affinity using privileged information. BMC Bioinformatics 2018; 19:425. [PMID: 30442086 PMCID: PMC6238365 DOI: 10.1186/s12859-018-2448-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 10/25/2018] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. RESULTS In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. CONCLUSIONS The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
- Information Technology Center (ITC), University of Azad Jammu & Kashmir, Muzaffarabad, Azad Kashmir, 13100, Pakistan
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA
| | - Amina Asif
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA.
| | - Fayyaz Ul Amir Afsar Minhas
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan.
| |
Collapse
|
14
|
Yang KK, Wu Z, Bedbrook CN, Arnold FH. Learned protein embeddings for machine learning. Bioinformatics 2018; 34:2642-2648. [PMID: 29584811 PMCID: PMC6061698 DOI: 10.1093/bioinformatics/bty178] [Citation(s) in RCA: 152] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 03/20/2018] [Accepted: 03/22/2018] [Indexed: 12/26/2022] Open
Abstract
Motivation Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn. We propose to learn embedded representations of protein sequences that take advantage of the vast quantity of unmeasured protein sequence data available. These embeddings are low-dimensional and can greatly simplify downstream modeling. Results The predictive power of Gaussian process models trained using embeddings is comparable to those trained on existing representations, which suggests that embeddings enable accurate predictions despite having orders of magnitude fewer dimensions. Moreover, embeddings are simpler to obtain because they do not require alignments, structural data, or selection of informative amino-acid properties. Visualizing the embedding vectors shows meaningful relationships between the embedded proteins are captured. Availability and implementation The embedding vectors and code to reproduce the results are available at https://github.com/fhalab/embeddings_reproduction/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kevin K Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Zachary Wu
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Claire N Bedbrook
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
15
|
Basit AH, Abbasi WA, Asif A, Gull S, Minhas FUAA. Training host-pathogen protein-protein interaction predictors. J Bioinform Comput Biol 2018; 16:1850014. [PMID: 30060698 DOI: 10.1142/s0219720018500142] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Detection of protein-protein interactions (PPIs) plays a vital role in molecular biology. Particularly, pathogenic infections are caused by interactions of host and pathogen proteins. It is important to identify host-pathogen interactions (HPIs) to discover new drugs to counter infectious diseases. Conventional wet lab PPI detection techniques have limitations in terms of cost and large-scale application. Hence, computational approaches are developed to predict PPIs. This study aims to develop machine learning models to predict inter-species PPIs with a special interest in HPIs. Specifically, we focus on seeking answers to three questions that arise while developing an HPI predictor: (1) How should negative training examples be selected? (2) Does assigning sample weights to individual negative examples based on their similarity to positive examples improve generalization performance? and, (3) What should be the size of negative samples as compared to the positive samples during training and evaluation? We compare two available methods for negative sampling: random versus DeNovo sampling and our experiments show that DeNovo sampling offers better accuracy. However, our experiments also show that generalization performance can be improved further by using a soft DeNovo approach that assigns sample weights to negative examples inversely proportional to their similarity to known positive examples during training. Based on our findings, we have also developed an HPI predictor called HOPITOR (Host-Pathogen Interaction Predictor) that can predict interactions between human and viral proteins. The HOPITOR web server can be accessed at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#HoPItor .
Collapse
Affiliation(s)
- Abdul Hannan Basit
- * Department of Computer and Information Sciences, Biomedical Informatics Research Laboratory, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan.,† Department of Electrical Engineering, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan
| | - Wajid Arshad Abbasi
- * Department of Computer and Information Sciences, Biomedical Informatics Research Laboratory, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan
| | - Amina Asif
- * Department of Computer and Information Sciences, Biomedical Informatics Research Laboratory, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan
| | - Sadaf Gull
- * Department of Computer and Information Sciences, Biomedical Informatics Research Laboratory, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan
| | - Fayyaz Ul Amir Afsar Minhas
- * Department of Computer and Information Sciences, Biomedical Informatics Research Laboratory, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad 44000, Pakistan
| |
Collapse
|
16
|
Abbasi WA, Asif A, Andleeb S, Minhas FUAA. CaMELS:In silicoprediction of calmodulin binding proteins and their binding sites. Proteins 2017; 85:1724-1740. [DOI: 10.1002/prot.25330] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 05/13/2017] [Accepted: 06/07/2017] [Indexed: 11/08/2022]
Affiliation(s)
- Wajid Arshad Abbasi
- Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences (DCIS); Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore; Islamabad Pakistan
| | - Amina Asif
- Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences (DCIS); Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore; Islamabad Pakistan
| | - Saiqa Andleeb
- Biotechnology Laboratory, Department of Zoology; University of AJ&K; Muzaffarabad AK Pakistan
| | - Fayyaz ul Amir Afsar Minhas
- Biomedical Informatics Research Laboratory, Department of Computer and Information Sciences (DCIS); Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore; Islamabad Pakistan
| |
Collapse
|
17
|
Choi D, Park B, Chae H, Lee W, Han K. Predicting protein-binding regions in RNA using nucleotide profiles and compositions. BMC SYSTEMS BIOLOGY 2017; 11:16. [PMID: 28361677 PMCID: PMC5374631 DOI: 10.1186/s12918-017-0386-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. Results We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Conclusions Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0386-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daesik Choi
- Department of Computer Science and Engineering, Inha University, Incheon, 22212, South Korea
| | - Byungkyu Park
- Department of Computer Science and Engineering, Inha University, Incheon, 22212, South Korea
| | - Hanju Chae
- Department of Computer Science and Engineering, Inha University, Incheon, 22212, South Korea
| | - Wook Lee
- Department of Computer Science and Engineering, Inha University, Incheon, 22212, South Korea
| | - Kyungsook Han
- Department of Computer Science and Engineering, Inha University, Incheon, 22212, South Korea.
| |
Collapse
|
18
|
Kim B, Alguwaizani S, Zhou X, Huang DS, Park B, Han K. An improved method for predicting interactions between virus and human proteins. J Bioinform Comput Biol 2016; 15:1650024. [PMID: 27397631 DOI: 10.1142/s0219720016500244] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The interaction of virus proteins with host proteins plays a key role in viral infection and consequent pathogenesis. Many computational methods have been proposed to predict protein-protein interactions (PPIs), but most of the computational methods are intended for PPIs within a species rather than PPIs across different species such as virus-host PPIs. We developed a method that represents key features of virus and human proteins of variable length into a feature vector of fixed length. The key features include the relative frequency of amino acid triplets (RFAT), the frequency difference of amino acid triplets (FDAT) between virus and host proteins, and amino acid composition (AC). We constructed several support vector machine (SVM) models to evaluate our method and to compare our method with others on PPIs between human and two types of viruses: human papillomaviruses (HPV) and hepatitis C virus (HCV). Comparison of our method to others with same datasets of HPV-human PPIs and HCV-human PPIs showed that the performance of our method is significantly higher than others in all performance measures. Using the SVM model with gene ontology (GO) annotations of proteins, we predicted new HPV-human PPIs. We believe our approach will be useful in predicting heterogeneous PPIs.
Collapse
Affiliation(s)
- Byungmin Kim
- * Department of Computer Science and Engineering, Inha University, Incheon 22212, South Korea
| | - Saud Alguwaizani
- * Department of Computer Science and Engineering, Inha University, Incheon 22212, South Korea
| | - Xiang Zhou
- * Department of Computer Science and Engineering, Inha University, Incheon 22212, South Korea
| | - De-Shuang Huang
- † Machine Learning and Systems Biology Lab, College of Electronics and Information Engineering, Tongji University, Shanghai 201804, P. R. China
| | - Byunkyu Park
- * Department of Computer Science and Engineering, Inha University, Incheon 22212, South Korea
| | - Kyungsook Han
- * Department of Computer Science and Engineering, Inha University, Incheon 22212, South Korea
| |
Collapse
|