1
|
Kumar S, Teli MK, Kim MH. GPCR-IPL score: multilevel featurization of GPCR-ligand interaction patterns and prediction of ligand functions from selectivity to biased activation. Brief Bioinform 2024; 25:bbae105. [PMID: 38517694 PMCID: PMC10959162 DOI: 10.1093/bib/bbae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 02/12/2024] [Accepted: 02/27/2024] [Indexed: 03/24/2024] Open
Abstract
G-protein-coupled receptors (GPCRs) mediate diverse cell signaling cascades after recognizing extracellular ligands. Despite the successful history of known GPCR drugs, a lack of mechanistic insight into GPCR challenges both the deorphanization of some GPCRs and optimization of the structure-activity relationship of their ligands. Notably, replacing a small substituent on a GPCR ligand can significantly alter extracellular GPCR-ligand interaction patterns and motion of transmembrane helices in turn to occur post-binding events of the ligand. In this study, we designed 3D multilevel features to describe the extracellular interaction patterns. Subsequently, these 3D features were utilized to predict the post-binding events that result from conformational dynamics from the extracellular to intracellular areas. To understand the adaptability of GPCR ligands, we collected the conformational information of flexible residues during binding and performed molecular featurization on a broad range of GPCR-ligand complexes. As a result, we developed GPCR-ligand interaction patterns, binding pockets, and ligand features as score (GPCR-IPL score) for predicting the functional selectivity of GPCR ligands (agonism versus antagonism), using the multilevel features of (1) zoomed-out 'residue level' (for flexible transmembrane helices of GPCRs), (2) zoomed-in 'pocket level' (for sophisticated mode of action) and (3) 'atom level' (for the conformational adaptability of GPCR ligands). GPCR-IPL score demonstrated reliable performance, achieving area under the receiver operating characteristic of 0.938 and area under the precision-recall curve of 0.907 (available in gpcr-ipl-score.onrender.com). Furthermore, we used the molecular features to predict the biased activation of downstream signaling (Gi/o, Gq/11, Gs and β-arrestin) as well as the functional selectivity. The resulting models are interpreted and applied to out-of-set validation with three scenarios including the identification of a new MRGPRX antagonist.
Collapse
Affiliation(s)
- Surendra Kumar
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| | - Mahesh K Teli
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| | - Mi-hyun Kim
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| |
Collapse
|
2
|
Lee J, Kim M, Han K, Yoon S. StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads. Genes Genomics 2023; 45:1599-1609. [PMID: 37837515 DOI: 10.1007/s13258-023-01458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 10/01/2023] [Indexed: 10/16/2023]
Abstract
BACKGROUND Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose. METHODS In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue. RESULTS The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested. CONCLUSION By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.
Collapse
Affiliation(s)
- Joongho Lee
- Dept. of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea
| | - Minsoo Kim
- Dept. of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea
| | - Kyudong Han
- Center for Bio-Medical Engineering Core Facility, Dankook Univ, Cheonan, 31116, Korea
- Dept. of Microbiology, College of Science & Technology, Dankook Univ, Cheonan, 31116, Korea
- HuNbiome Co., Ltd, R&D Center, Seoul, 08503, Korea
| | - Seokhyun Yoon
- Dept. of Electronics and Electrical Engineering, College of Engineering, Dankook Univ, Yongin-si, 16890, Korea.
| |
Collapse
|
3
|
Moon C, Kim D. Prediction of drug-target interactions through multi-task learning. Sci Rep 2022; 12:18323. [PMID: 36316405 PMCID: PMC9622881 DOI: 10.1038/s41598-022-23203-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/26/2022] [Indexed: 12/31/2022] Open
Abstract
Identifying the binding between the target proteins and molecules is essential in drug discovery. The multi-task learning method has been introduced to facilitate knowledge sharing among tasks when the amount of information for each task is small. However, multi-task learning sometimes worsens the overall performance or generates a trade-off between individual task's performance. In this study, we propose a general multi-task learning scheme that not only increases the average performance but also minimizes individual performance degradation, through group selection and knowledge distillation. The groups are selected on the basis of chemical similarity between ligand sets of targets, and the similar targets in the same groups are trained together. During training, we apply knowledge distillation with teacher annealing. The multi-task learning models are guided by the predictions of the single-task learning models. This method results in higher average performance than that from single-task learning and classic multi-task learning. Further analysis reveals that multi-task learning is particularly effective for low performance tasks, and knowledge distillation helps the model avoid the degradation in individual task performance in multi-task learning.
Collapse
Affiliation(s)
- Chaeyoung Moon
- grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| | - Dongsup Kim
- grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| |
Collapse
|
4
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
5
|
Drug-Target Interaction Prediction Based on Multisource Information Weighted Fusion. CONTRAST MEDIA & MOLECULAR IMAGING 2021; 2021:6044256. [PMID: 34908912 PMCID: PMC8635946 DOI: 10.1155/2021/6044256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/22/2021] [Indexed: 01/08/2023]
Abstract
Recently, in most existing studies, it is assumed that there are no interaction relationships between drugs and targets with unknown interactions. However, unknown interactions mean the relationships between drugs and targets have just not been confirmed. In this paper, samples for which the relationship between drugs and targets has not been determined are considered unlabeled. A weighted fusion method of multisource information is proposed to screen drug-target interactions. Firstly, some drug-target pairs which may have interactions are selected. Secondly, the selected drug-target pairs are added to the positive samples, which are regarded as known to have interaction relationships, and the original interaction relationship matrix is revised. Finally, the revised datasets are used to predict the interaction derived from the bipartite local model with neighbor-based interaction profile inferring (BLM-NII). Experiments demonstrate that the proposed method has greatly improved specificity, sensitivity, precision, and accuracy compared with the BLM-NII method. In addition, compared with several state-of-the-art methods, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) of the proposed method are excellent.
Collapse
|
6
|
Jia LN, Yan X, You ZH, Zhou X, Li LP, Wang L, Song KJ. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320984171. [PMID: 33488064 PMCID: PMC7768313 DOI: 10.1177/1176934320984171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022] Open
Abstract
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- Lei Wang, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Ke-Jian Song
- School of information engineering, Jiangxi University of Science and Technology, Ganzhou, China
| |
Collapse
|