1
|
Li Y, Zou Q, Dai Q, Stalin A, Luo X. Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM. PLoS Comput Biol 2025; 21:e1012513. [PMID: 40359430 PMCID: PMC12121914 DOI: 10.1371/journal.pcbi.1012513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 05/29/2025] [Accepted: 04/29/2025] [Indexed: 05/15/2025] Open
Abstract
Transcription factors (TFs) can affect gene expression by binding to certain specific DNA sequences. This binding process of TFs may be modulated by DNA methylation. A subset of TFs that serve as methylation readers preferentially binds to certain methylated DNA and is defined as TFPM. The identification of TFPMs enhances our understanding of DNA methylation's role in gene regulation. However, their experimental identification is resource-demanding. In this study, we propose a novel two-step computational approach to classify TFs and TFPMs. First, we employed a fine-tuned ProtBERT model to differentiate between the classes of TFs and non-TFs. Second, we combined the Reduced Amino Acid Category (RAAC) with K-mer and SVM to predict the potential of TFs to bind to methylated DNA. Comparative experiments demonstrate that our proposed methods outperform all existing approaches and emphasize the efficiency of our computational framework in classifying TFs and TFPMs. Cross-species validation on an independent mouse dataset further demonstrates the generalizability of our proposed framework In addition, we conducted predictions on all human transcription factors and found that most of the top 20 proteins belong to the Krueppel C2H2-type Zinc-finger family. So far, some studies have demonstrated a partial correlation between this family and DNA methylation and confirmed the preference of some of its members, thereby showing the robustness of our approach.
Collapse
Affiliation(s)
- Yanchao Li
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qi Dai
- College of Life Science and medicine, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, China
| | - Antony Stalin
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Ximei Luo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
2
|
Lyu Y, Xiong T, Shi S, Wang D, Yang X, Liu Q, Li Z, Li Z, Wang C, Chen R. Prediction of the Trimer Protein Interface Residue Pair by CNN-GRU Model Based on Multi-Feature Map. NANOMATERIALS (BASEL, SWITZERLAND) 2025; 15:188. [PMID: 39940164 PMCID: PMC11821012 DOI: 10.3390/nano15030188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 01/21/2025] [Accepted: 01/22/2025] [Indexed: 02/14/2025]
Abstract
Most life activities of organisms are realized through protein-protein interactions, and these interactions are mainly achieved through residue-residue contact between monomer proteins. Consequently, studying residue-residue contact at the protein interaction interface can contribute to a deeper understanding of the protein-protein interaction mechanism. In this paper, we focus on the research of the trimer protein interface residue pair. Firstly, we utilize the amino acid k-interval product factor descriptor (AAIPF(k)) to integrate the positional information and physicochemical properties of amino acids, combined with the electric properties and geometric shape features of residues, to construct an 8 × 16 multi-feature map. This multi-feature map represents a sample composed of two residues on a trimer protein. Secondly, we construct a CNN-GRU deep learning framework to predict the trimer protein interface residue pair. The results show that when each dimer protein provides 10 prediction results and two protein-protein interaction interfaces of a trimer protein needed to be accurately predicted, the accuracy of our proposed method is 60%. When each dimer protein provides 10 prediction results and one protein-protein interaction interface of a trimer protein needs to be accurately predicted, the accuracy of our proposed method is 93%. Our results can provide experimental researchers with a limited yet precise dataset containing correct trimer protein interface residue pairs, which is of great significance in guiding the experimental resolution of the trimer protein three-dimensional structure. Furthermore, compared to other computational methods, our proposed approach exhibits superior performance in predicting residue-residue contact at the trimer protein interface.
Collapse
Affiliation(s)
- Yanfen Lyu
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
- Key Laboratory of Manufacture Technology of Veterinary Bioproducts, Ministry of Agriculture and Rural Affairs, Zhaoqing Dahuanong Biology Medicine Co., Ltd., Zhaoqing 526238, China
| | - Ting Xiong
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- Zhaoqing Branch of Guangdong Laboratory of Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China
| | - Shuaibo Shi
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Dong Wang
- School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China;
| | - Xueqing Yang
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Qihuan Liu
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Zhengtan Li
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Zhixin Li
- School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China; (S.S.); (X.Y.); (Q.L.); (Z.L.); (Z.L.)
| | - Chunxia Wang
- College of Landscape and Ecological Engineering, Hebei University of Engineering, Handan 056038, China
| | - Ruiai Chen
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; (Y.L.); (T.X.)
- Key Laboratory of Manufacture Technology of Veterinary Bioproducts, Ministry of Agriculture and Rural Affairs, Zhaoqing Dahuanong Biology Medicine Co., Ltd., Zhaoqing 526238, China
- Zhaoqing Branch of Guangdong Laboratory of Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China
| |
Collapse
|
3
|
Tayebi Z, Ali S, Patterson M. TCellR2Vec: efficient feature selection for TCR sequences for cancer classification. PeerJ Comput Sci 2024; 10:e2239. [PMID: 39650499 PMCID: PMC11622898 DOI: 10.7717/peerj-cs.2239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/14/2024] [Indexed: 12/11/2024]
Abstract
Cancer remains one of the leading causes of death globally. New immunotherapies that harness the patient's immune system to fight cancer show promise, but their development requires analyzing the diversity of immune cells called T-cells. T-cells have receptors that recognize and bind to cancer cells. Sequencing these T-cell receptors allows to provide insights into their immune response, but extracting useful information is challenging. In this study, we propose a new computational method, TCellR2Vec, to select key features from T-cell receptor sequences for classifying different cancer types. We extracted features like amino acid composition, charge, and diversity measures and combined them with other sequence embedding techniques. For our experiments, we used a dataset of over 50,000 T-cell receptor sequences from five cancer types, which showed that TCellR2Vec improved classification accuracy and efficiency over baseline methods. These results demonstrate TCellR2Vec's ability to capture informative aspects of complex T-cell receptor sequences. By improving computational analysis of the immune response, TCellR2Vec could aid the development of personalized immunotherapies tailored to each patient's T-cells. This has important implications for creating more effective cancer treatments based on the individual's immune system.
Collapse
Affiliation(s)
- Zahra Tayebi
- Computer Science, Georgia State University, Atlanta, GA, United States of America
| | - Sarwan Ali
- Computer Science, Georgia State University, Atlanta, GA, United States of America
| | - Murray Patterson
- Computer Science, Georgia State University, Atlanta, GA, United States of America
| |
Collapse
|
4
|
Zhang L. User emotion recognition and indoor space interaction design: a CNN model optimized by multimodal weighted networks. PeerJ Comput Sci 2024; 10:e2450. [PMID: 39650496 PMCID: PMC11623009 DOI: 10.7717/peerj-cs.2450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 10/04/2024] [Indexed: 12/11/2024]
Abstract
In interior interaction design, achieving intelligent user-interior interaction is contingent upon understanding the user's emotional responses. Precise identification of the user's visual emotions holds paramount importance. Current visual emotion recognition methods rely solely on singular features, predominantly facial expressions, resulting in inadequate coverage of visual characteristics and low recognition rates. This study introduces a deep learning-based multimodal weighting network model to address this challenge. The model initiates with a convolutional attention module, employing a self-attention mechanism within a convolutional neural network (CNN). As a result, the multimodal weighting network model is integrated to optimize weights during training. Finally, a weight network classifier is derived from these optimized weights to facilitate visual emotion recognition. Experimental outcomes reveal a 77.057% correctness rate and a 74.75% accuracy rate in visual emotion recognition. Comparative analysis against existing models demonstrates the superiority of the multimodal weight network model, showcasing its potential to enhance human-centric and intelligent indoor interaction design.
Collapse
Affiliation(s)
- Lingyu Zhang
- Space Lifestyle Design, Kookmin University, Seoul, Republic of South Korea
| |
Collapse
|
5
|
Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024; 85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]
Abstract
Protein-protein interactions play crucial roles in many biological processes. Traditionally, protein complex structures are normally built by protein-protein docking. With the rapid development of artificial intelligence and its great success in monomer protein structure prediction, deep learning has widely been applied to modeling protein-protein complex structures through inter-protein contact prediction and end-to-end approaches in the past few years. This article reviews the recent advances of deep-learning-based approaches in modeling protein-protein complex structures as well as their advantages and limitations. Challenges and possible future directions are also briefly discussed in applying deep learning for the prediction of protein complex structures.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China.
| |
Collapse
|