1
|
Hu Y, Chen Y, Qin Y, Huang R. Learning entity-oriented representation for biomedical relation extraction. J Biomed Inform 2023; 147:104527. [PMID: 37852347 DOI: 10.1016/j.jbi.2023.104527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 10/15/2023] [Indexed: 10/20/2023]
Abstract
Biomedical Relation Extraction (BioRE) aims to automatically extract semantic relations for given entity pairs and is of great significance in biomedical research. Current popular methods often utilize pretrained language models to extract semantic features from individual input instances, which frequently suffer from overlapping semantics. Overlapping semantics refers to the situation in which a sentence contains multiple entity pairs that share the same context, leading to highly similar information between these entity pairs. In this study, we propose a model for learning Entity-oriented Representation (EoR) that aims to improve the performance of the model by enhancing the discriminability between entity pairs. It contains three modules: sentence representation, entity-oriented representation, and output. The first module learns the global semantic information of the input instance; the second module focuses on extracting the semantic information of the sentence from the target entities; and the third module enhances distinguishability among entity pairs and classifies the relation type. We evaluated our approach on four BioRE tasks with eight datasets, and the experiments showed that our EoR achieved state-of-the-art performance for PPI, DDI, CPI, and DPI tasks. Further analysis demonstrated the benefits of entity-oriented semantic information in handling multiple entity pairs in the BioRE task.
Collapse
Affiliation(s)
- Ying Hu
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Yanping Chen
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Yongbin Qin
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Ruizhang Huang
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
2
|
Wang W, Meng X, Xiang J, Shuai Y, Bedru HD, Li M. CACO: A Core-Attachment Method With Cross-Species Functional Ortholog Information to Detect Human Protein Complexes. IEEE J Biomed Health Inform 2023; 27:4569-4578. [PMID: 37399160 DOI: 10.1109/jbhi.2023.3289490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Protein complexes play an essential role in living cells. Detecting protein complexes is crucial to understand protein functions and treat complex diseases. Due to high time and resource consumption of experiment approaches, many computational approaches have been proposed to detect protein complexes. However, most of them are only based on protein-protein interaction (PPI) networks, which heavily suffer from the noise in PPI networks. Therefore, we propose a novel core-attachment method, named CACO, to detect human protein complexes, by integrating the functional information from other species via protein ortholog relations. First, CACO constructs a cross-species ortholog relation matrix and transfers GO terms from other species as a reference to evaluate the confidence of PPIs. Then, a PPI filter strategy is adopted to clean the PPI network and thus a weighted clean PPI network is constructed. Finally, a new effective core-attachment algorithm is proposed to detect protein complexes from the weighted PPI network. Compared to other thirteen state-of-the-art methods, CACO outperforms all of them in terms of F-measure and Composite Score, showing that integrating ortholog information and the proposed core-attachment algorithm are effective in detecting protein complexes.
Collapse
|
3
|
Pan Y, Wang Y, Guan J, Zhou S. PCGAN: a generative approach for protein complex identification from protein interaction networks. Bioinformatics 2023; 39:btad473. [PMID: 37531266 PMCID: PMC10457665 DOI: 10.1093/bioinformatics/btad473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 07/23/2023] [Accepted: 08/01/2023] [Indexed: 08/04/2023] Open
Abstract
MOTIVATION Protein complexes are groups of polypeptide chains linked by non-covalent protein-protein interactions, which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks by mining dense subnetworks or subgraphs. RESULTS In this article, different from the existing works, we propose a novel approach for this task based on generative adversarial networks, which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a protein interaction network. To effectively support model training and testing, we construct two more comprehensive and reliable protein interaction networks and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes. AVAILABILITY AND IMPLEMENTATION https://github.com/yul-pan/PCGAN.
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
4
|
Zhan Y, Liu J, Wu M, Tan CSH, Li X, Ou-Yang L. A partially shared joint clustering framework for detecting protein complexes from multiple state-specific signed interaction networks. Comput Biol Med 2023; 159:106936. [PMID: 37105110 DOI: 10.1016/j.compbiomed.2023.106936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/27/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023]
Abstract
Detecting protein complexes is critical for studying cellular organizations and functions. The accumulation of protein-protein interaction (PPI) data enables the identification of protein complexes computationally. Although a great number of computational methods have been proposed to identify protein complexes from PPI networks, most of them ignore the signs of PPIs that reflect the ways proteins interact (activation or inhibition). As not all PPIs imply co-complex relationships, taking into account the signs of PPIs can benefit the identification of protein complexes. Moreover, PPI networks are not static, but vary with the change of cell states or environments. However, existing methods are primarily designed for single-network clustering, and rarely consider joint clustering of multiple PPI networks. In this study, we propose a novel partially shared signed network clustering (PS-SNC) model for identifying protein complexes from multiple state-specific signed PPI networks jointly. PS-SNC can not only consider the signs of PPIs, but also identify the common and unique protein complexes in different states. Experimental results on synthetic and real datasets show that our PS-SNC model can achieve better performance than other state-of-the-art protein complex detection methods. Extensive analysis on real datasets demonstrate the effectiveness of PS-SNC in revealing novel insights about the underlying patterns of different cell lines.
Collapse
Affiliation(s)
- Youlin Zhan
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Jiahan Liu
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Min Wu
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Chris Soon Heng Tan
- Department of Chemistry, College of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaoli Li
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China; Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518129, China.
| |
Collapse
|
5
|
Wang X, Zhang N, Zhao Y, Wang J. A New Method for Recognizing Protein Complexes Based on Protein Interaction Networks and GO Terms. Front Genet 2021; 12:792265. [PMID: 34966415 PMCID: PMC8711776 DOI: 10.3389/fgene.2021.792265] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 11/10/2021] [Indexed: 01/29/2023] Open
Abstract
Motivation: A protein complex is the combination of proteins which interact with each other. Protein–protein interaction (PPI) networks are composed of multiple protein complexes. It is very difficult to recognize protein complexes from PPI data due to the noise of PPI. Results: We proposed a new method, called Topology and Semantic Similarity Network (TSSN), based on topological structure characteristics and biological characteristics to construct the PPI. Experiments show that the TSSN can filter the noise of PPI data. We proposed a new algorithm, called Neighbor Nodes of Proteins (NNP), for recognizing protein complexes by considering their topology information. Experiments show that the algorithm can identify more protein complexes and more accurately. The recognition of protein complexes is vital in research on evolution analysis. Availability and implementation: https://github.com/bioinformatical-code/NNP.
Collapse
Affiliation(s)
- Xiaoting Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Nan Zhang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Yulan Zhao
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| |
Collapse
|
6
|
Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020; 22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open
Abstract
Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Shixi Fan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|