1
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2025; 67:862-884. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
2
|
Li Z, Zhou H, Xu G, Zhang P, Zhai N, Zheng Q, Liu P, Jin L, Bai G, Zhang H. Genome-wide analysis of long noncoding RNAs in response to salt stress in Nicotiana tabacum. BMC PLANT BIOLOGY 2023; 23:646. [PMID: 38097981 PMCID: PMC10722832 DOI: 10.1186/s12870-023-04659-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/04/2023] [Indexed: 12/17/2023]
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have been shown to play important roles in the response of plants to various abiotic stresses, including drought, heat and salt stress. However, the identification and characterization of genome-wide salt-responsive lncRNAs in tobacco (Nicotiana tabacum L.) have been limited. Therefore, this study aimed to identify tobacco lncRNAs in roots and leaves in response to different durations of salt stress treatment. RESULTS A total of 5,831 lncRNAs were discovered, with 2,428 classified as differentially expressed lncRNAs (DElncRNAs) in response to salt stress. Among these, only 214 DElncRNAs were shared between the 2,147 DElncRNAs in roots and the 495 DElncRNAs in leaves. KEGG pathway enrichment analysis revealed that these DElncRNAs were primarily associated with pathways involved in starch and sucrose metabolism in roots and cysteine and methionine metabolism pathway in leaves. Furthermore, weighted gene co-expression network analysis (WGCNA) identified 15 co-expression modules, with four modules strongly linked to salt stress across different treatment durations (MEsalmon, MElightgreen, MEgreenyellow and MEdarkred). Additionally, an lncRNA-miRNA-mRNA network was constructed, incorporating several known salt-associated miRNAs such as miR156, miR169 and miR396. CONCLUSIONS This study enhances our understanding of the role of lncRNAs in the response of tobacco to salt stress. It provides valuable information on co-expression networks of lncRNA and mRNAs, as well as networks of lncRNAs-miRNAs-mRNAs. These findings identify important candidate lncRNAs that warrant further investigation in the study of plant-environment interactions.
Collapse
Affiliation(s)
- Zefeng Li
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Huina Zhou
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Guoyun Xu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Peipei Zhang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
| | - Niu Zhai
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Qingxia Zheng
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Pingping Liu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Lifeng Jin
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China
- Beijing Life Science Academy (BLSA), Beijing, China
| | - Ge Bai
- National Tobacco Genetic Engineering Research Center, Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan, China.
| | - Hui Zhang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 45000, China.
- Beijing Life Science Academy (BLSA), Beijing, China.
| |
Collapse
|
3
|
Shao J, Zhang Q, Yan K, Liu B. PreHom-PCLM: protein remote homology detection by combing motifs and protein cubic language model. Brief Bioinform 2023; 24:bbad347. [PMID: 37833837 DOI: 10.1093/bib/bbad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/14/2023] [Accepted: 09/14/2023] [Indexed: 10/15/2023] Open
Abstract
Protein remote homology detection is essential for structure prediction, function prediction, disease mechanism understanding, etc. The remote homology relationship depends on multiple protein properties, such as structural information and local sequence patterns. Previous studies have shown the challenges for predicting remote homology relationship by protein features at sequence level (e.g. position-specific score matrix). Protein motifs have been used in structure and function analysis due to their unique sequence patterns and implied structural information. Therefore, designing a usable architecture to fuse multiple protein properties based on motifs is urgently needed to improve protein remote homology detection performance. To make full use of the characteristics of motifs, we employed the language model called the protein cubic language model (PCLM). It combines multiple properties by constructing a motif-based neural network. Based on the PCLM, we proposed a predictor called PreHom-PCLM by extracting and fusing multiple motif features for protein remote homology detection. PreHom-PCLM outperforms the other state-of-the-art methods on the test set and independent test set. Experimental results further prove the effectiveness of multiple features fused by PreHom-PCLM for remote homology detection. Furthermore, the protein features derived from the PreHom-PCLM show strong discriminative power for proteins from different structural classes in the high-dimensional space. Availability and Implementation: http://bliulab.net/PreHom-PCLM.
Collapse
Affiliation(s)
- Jiangyi Shao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Qi Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
4
|
Qiu XY, Wu H, Shao J. TALE-cmap: Protein function prediction based on a TALE-based architecture and the structure information from contact map. Comput Biol Med 2022; 149:105938. [DOI: 10.1016/j.compbiomed.2022.105938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/26/2022] [Accepted: 08/06/2022] [Indexed: 11/03/2022]
|
5
|
Jin X, Luo X, Liu B. PHR-search: a search framework for protein remote homology detection based on the predicted protein hierarchical relationships. Brief Bioinform 2022; 23:6520306. [PMID: 35134113 DOI: 10.1093/bib/bbab609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/14/2021] [Accepted: 12/30/2021] [Indexed: 11/13/2022] Open
Abstract
Protein remote homology detection is one of the most fundamental research tool for protein structure and function prediction. Most search methods for protein remote homology detection are evaluated based on the Structural Classification of Proteins-extended (SCOPe) benchmark, but the diverse hierarchical structure relationships between the query protein and candidate proteins are ignored by these methods. In order to further improve the predictive performance for protein remote homology detection, a search framework based on the predicted protein hierarchical relationships (PHR-search) is proposed. In the PHR-search framework, the superfamily level prediction information is obtained by extracting the local and global features of the Hidden Markov Model (HMM) profile through a convolution neural network and it is converted to the fold level and class level prediction information according to the hierarchical relationships of SCOPe. Based on these predicted protein hierarchical relationships, filtering strategy and re-ranking strategy are used to construct the two-level search of PHR-search. Experimental results show that the PHR-search framework achieves the state-of-the-art performance by employing five basic search methods, including HHblits, JackHMMER, PSI-BLAST, DELTA-BLAST and PSI-BLASTexB. Furthermore, the web server of PHR-search is established, which can be accessed at http://bliulab.net/PHR-search.
Collapse
Affiliation(s)
- Xiaopeng Jin
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Xiaoling Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
6
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|