1
|
Kang Y, Jin K, Pan L. AI designed, mutation resistant broad neutralizing antibodies against multiple SARS-CoV-2 strains. Sci Rep 2025; 15:15533. [PMID: 40319133 PMCID: PMC12049519 DOI: 10.1038/s41598-025-98979-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 04/16/2025] [Indexed: 05/07/2025] Open
Abstract
In this study, we developed a digital twin for SARS-CoV-2 by integrating diverse data and metadata with multiple data types and processing strategies, including machine learning, natural language processing, protein structural modeling, and protein sequence language modeling. This approach enabled us to computationally design neutralizing antibodies against over 1300 historical strains of SARS-CoV-2, encompassing 64 mutations in the receptor binding domain (RBD) region. 70 AI-designed antibodies were experimentally validated through binding assay and real viral neutralization assays against various strains, including later Omicron strains do not present in the initial design database. 14% of these antibodies exhibited strong reactivity against the RBD of multiple strains, achieving triple cross-binding hit rates using ELISA assay. 10 antibodies neutralized the cytopathic effects (CPE) of the Delta strain at IC50 values of < 10 µg/ml, and one antibody neutralized the CPE of Omicron. These findings demonstrate the potential of our approach to influence future therapeutic design for existing virus strains and predict hidden patterns in viral evolution that AI can leverage to develop emerging antiviral treatments.
Collapse
Affiliation(s)
- Yue Kang
- Ainnocence Inc., Suite B PMB 1147, Mountain View, CA, 94040, USA
| | - Kevin Jin
- Ainnocence Inc., Suite B PMB 1147, Mountain View, CA, 94040, USA
| | - Lurong Pan
- Ainnocence Inc., Suite B PMB 1147, Mountain View, CA, 94040, USA.
| |
Collapse
|
2
|
Yadalam PK, Arumuganainar D, Natarajan PM, Ardila CM. Artificial intelligence-powered prediction of AIM-2 inflammasome sequences using transformers and graph attention networks in periodontal inflammation. Sci Rep 2025; 15:8733. [PMID: 40082687 PMCID: PMC11906867 DOI: 10.1038/s41598-025-93409-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 03/06/2025] [Indexed: 03/16/2025] Open
Abstract
Periodontal inflammation is a chronic condition affecting the tissues surrounding teeth. Initiated by dental plaque, it triggers an immune response leading to tissue destruction. The AIM-2 inflammasome regulates this response, and understanding its peptide sequences could aid in developing targeted therapeutics. This study explores using transformers and graph attention networks (GAT) to treat periodontal inflammation. UniProt was used to download AIM-2 inflammasome proteins and FASTA sequences with 100%, 90%, and 50% similarity. DeepBio, a web service for developing deep-learning architectures, analyzed these sequences. Peptide sequence prediction methods were evaluated using a transformer, RNN-CNN, and GAT models. The transformer model achieved 84% accuracy, the GAT model 86%, and the RNN-CNN 64%. Both transformer and GAT models predicted peptide sequences more effectively than the RNN-CNN model, with the Transformer showing the highest class accuracy at 85%, followed by the GAT model at 80%. Models exhibited varying sensitivity and specificity, with the Transformer demonstrating superior performance in overall and class-specific peptide sequence prediction. AI-based peptide sequence prediction using transformers, GAT, and RNN-CNN shows promise for accurately predicting AIM-2 peptide sequences, with transformers and GAT outperforming RNN-CNN in accuracy and class accuracy.
Collapse
Affiliation(s)
- Pradeep Kumar Yadalam
- Department of Periodontics, Saveetha Institute of Medical and Technical Sciences, Saveetha Dental College and Hospital, Saveetha University, Chennai, Tamil Nadu, 600077, India
| | - Deepavalli Arumuganainar
- Department of Periodontics, Saveetha Institute of Medical and Technical Sciences, Saveetha Dental College and Hospital, Saveetha University, Chennai, Tamil Nadu, 600077, India
| | - Prabhu Manickam Natarajan
- Department of Clinical Sciences, Center of Medical and Bio-allied Health Sciences and Research, College of Dentistry, Ajman University, Ajman, 346, United Arab Emirates.
| | - Carlos M Ardila
- Department of Basic Sciences, Faculty of Dentistry, Universidad de Antioquia U de A, Medellín, 050010, Colombia.
| |
Collapse
|
3
|
Orouji S, Liu MC, Korem T, Peters MAK. Domain adaptation in small-scale and heterogeneous biological datasets. SCIENCE ADVANCES 2024; 10:eadp6040. [PMID: 39705361 DOI: 10.1126/sciadv.adp6040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 11/15/2024] [Indexed: 12/22/2024]
Abstract
Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.
Collapse
Affiliation(s)
- Seyedmehdi Orouji
- Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA
| | - Martin C Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Tal Korem
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
- CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, Canada
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA
- CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, Canada
- CIFAR Fellow, Program in Brain, Mind, & Consciousness, CIFAR, Toronto, Canada
| |
Collapse
|
4
|
Zheng Y, Li Q, Freiberger MI, Song H, Hu G, Zhang M, Gu R, Li J. Predicting the Dynamic Interaction of Intrinsically Disordered Proteins. J Chem Inf Model 2024; 64:6768-6777. [PMID: 39163306 DOI: 10.1021/acs.jcim.4c00930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Intrinsically disordered proteins (IDPs) participate in various biological processes. Interactions involving IDPs are usually dynamic and are affected by their inherent conformation fluctuations. Comprehensive characterization of these interactions based on current techniques is challenging. Here, we present GSALIDP, a GraphSAGE-embedded LSTM network, to capture the dynamic nature of IDP-involved interactions and predict their behaviors. This framework models multiple conformations of IDP as a dynamic graph, which can effectively describe the fluctuation of its flexible conformation. The dynamic interaction between IDPs is studied, and the data sets of IDP conformations and their interactions are obtained through atomistic molecular dynamic (MD) simulations. Residues of IDP are encoded through a series of features including their frustration. GSALIDP can effectively predict the interaction sites of IDP and the contact residue pairs between IDPs. Its performance in predicting IDP interactions is on par with or even better than the conventional models in predicting the interaction of structural proteins. To the best of our knowledge, this is the first model to extend the protein interaction prediction to IDP-involved interactions.
Collapse
Affiliation(s)
- Yuchuan Zheng
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Qixiu Li
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Maria I Freiberger
- Protein Physiology Lab, Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires C1428EGA, Argentina
| | - Haoyu Song
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Guorong Hu
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Moxin Zhang
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Ruoxu Gu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Jingyuan Li
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| |
Collapse
|
5
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
6
|
Ke J, Zhao J, Li H, Yuan L, Dong G, Wang G. Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model. Comput Biol Med 2024; 174:108330. [PMID: 38588617 DOI: 10.1016/j.compbiomed.2024.108330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/06/2024] [Accepted: 03/17/2024] [Indexed: 04/10/2024]
Abstract
N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets.
Collapse
Affiliation(s)
- Jinsong Ke
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jianmei Zhao
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China; College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Hongfei Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China; College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, Quzhou, 324000, China
| | - Guanghui Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
7
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
8
|
Lin P, Yan Y, Tao H, Huang SY. Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes. Nat Commun 2023; 14:4935. [PMID: 37582780 PMCID: PMC10427616 DOI: 10.1038/s41467-023-40426-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
Membrane proteins are encoded by approximately a quarter of human genes. Inter-chain residue-residue contact information is important for structure prediction of membrane protein complexes and valuable for understanding their molecular mechanism. Although many deep learning methods have been proposed to predict the intra-protein contacts or helix-helix interactions in membrane proteins, it is still challenging to accurately predict their inter-chain contacts due to the limited number of transmembrane proteins. Addressing the challenge, here we develop a deep transfer learning method for predicting inter-chain contacts of transmembrane protein complexes, named DeepTMP, by taking advantage of the knowledge pre-trained from a large data set of non-transmembrane proteins. DeepTMP utilizes a geometric triangle-aware module to capture the correct inter-chain interaction from the coevolution information generated by protein language models. DeepTMP is extensively evaluated on a test set of 52 self-associated transmembrane protein complexes, and compared with state-of-the-art methods including DeepHomo2.0, CDPred, GLINTER, DeepHomo, and DNCON2_Inter. It is shown that DeepTMP considerably improves the precision of inter-chain contact prediction and outperforms the existing approaches in both accuracy and robustness.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Huanyu Tao
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| |
Collapse
|
9
|
Sunny S, Prakash PB, Gopakumar G, Jayaraj PB. DeepBindPPI: Protein-Protein Binding Site Prediction Using Attention Based Graph Convolutional Network. Protein J 2023; 42:276-287. [PMID: 37198346 PMCID: PMC10191823 DOI: 10.1007/s10930-023-10121-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 05/19/2023]
Abstract
Due to the importance of protein-protein interactions in defence mechanism of living body, attempts were made to investigate its attributes, including, but not limited to, binding affinity, and binding region. Contemporary strategies for binding site prediction largely resort to deep learning techniques but turned out to be low precision models. As laboratory experiments for drug discovery tasks utilize this information, increased false positives devalue the computational methods. This emphasize the need to develop enhanced strategies. DeepBindPPI employs deep learning technique to predict the binding regions of proteins, particularly antigen-antibody interaction sites. The results obtained are applied in a docking environment to confirm their correctness. An integration of graph convolutional network with attention mechanism predicts interacting amino acids with improved precision. The model learns the determining factors in interaction from a general pool of proteins and is then fine-tuned using antigen-antibody data. Comparison of the proposed method with existing techniques shows that the developed model has comparable performance. The use of a separate spatial network clearly improved the precision of the proposed method from 0.4 to 0.5. An attempt to utilize the interface information for docking using the HDOCK server gives promising results, with high-quality structures appearing in the top10 ranks.
Collapse
Affiliation(s)
- Sharon Sunny
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | | | - G. Gopakumar
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | - P. B. Jayaraj
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| |
Collapse
|
10
|
Lyu Y, He R, Hu J, Wang C, Gong X. Prediction of the tetramer protein complex interaction based on CNN and SVM. Front Genet 2023; 14:1076904. [PMID: 36777731 PMCID: PMC9909274 DOI: 10.3389/fgene.2023.1076904] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 01/16/2023] [Indexed: 01/27/2023] Open
Abstract
Protein-protein interactions play an important role in life activities. The study of protein-protein interactions helps to better understand the mechanism of protein complex interaction, which is crucial for drug design, protein function annotation and three-dimensional structure prediction of protein complexes. In this paper, we study the tetramer protein complex interaction. The research has two parts: The first part is to predict the interaction between chains of the tetramer protein complex. In this part, we proposed a feature map to represent a sample generated by two chains of the tetramer protein complex, and constructed a Convolutional Neural Network (CNN) model to predict the interaction between chains of the tetramer protein complex. The AUC value of testing set is 0.6263, which indicates that our model can be used to predict the interaction between chains of the tetramer protein complex. The second part is to predict the tetramer protein complex interface residue pairs. In this part, we proposed a Support Vector Machine (SVM) ensemble method based on under-sampling and ensemble method to predict the tetramer protein complex interface residue pairs. In the top 10 predictions, when at least one protein-protein interaction interface is correctly predicted, the accuracy of our method is 82.14%. The result shows that our method is effective for the prediction of the tetramer protein complex interface residue pairs.
Collapse
Affiliation(s)
- Yanfen Lyu
- Department of Mathematics and PhysicsScience and Engineering, Hebei University of Engineering, Handan, China
| | - Ruonan He
- School of Information, Renmin University of China, Beijing, China
| | - Jingjing Hu
- Department of Mathematics and PhysicsScience and Engineering, Hebei University of Engineering, Handan, China
| | - Chunxia Wang
- School of Landscape and Ecological Engineering, Hebei University of Engineering, Handan, China,*Correspondence: Chunxia Wang, ; Xinqi Gong,
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, School of Math, Renmin University of China, Beijing, China,Beijing Academy of Artificial Intelligence, Beijing, China,*Correspondence: Chunxia Wang, ; Xinqi Gong,
| |
Collapse
|
11
|
Lin P, Yan Y, Huang SY. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief Bioinform 2023; 24:6849483. [PMID: 36440949 DOI: 10.1093/bib/bbac499] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
12
|
Cong H, Liu H, Cao Y, Chen Y, Liang C. Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism. Interdiscip Sci 2022; 14:421-438. [PMID: 35066812 DOI: 10.1007/s12539-021-00496-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 12/12/2022]
Abstract
As an important research field in bioinformatics, protein subcellular location prediction is critical to reveal the protein functions and provide insightful information for disease diagnosis and drug development. Predicting protein subcellular locations remains a challenging task due to the difficulty of finding representative features and robust classifiers. Many feature fusion methods have been widely applied to tackle the above issues. However, they still suffer from accuracy loss due to feature redundancy. Furthermore, multiple protein subcellular locations prediction is more complicated since it is fundamentally a multi-label classification problem. The traditional binary classifiers or even multi-class classifiers cannot achieve satisfactory results. This paper proposes a novel method for protein subcellular location prediction with both single and multiple sites based on deep convolutional neural networks. Specifically, we first obtain the integrated features by simultaneously considering the pseudo amino acid, amino acid index distribution, and physicochemical property. We then adopt deep convolutional neural networks to extract high-dimensional features from the fused feature, removing the redundant preliminary features and gaining better representations of the raw sequences. Moreover, we use the self-attention mechanism and a customized loss function to ensure that the model is more inclined to positive data. In addition, we use random k-label sets to reduce the number of prediction labels. Meanwhile, we employ a hybrid strategy of over-sampling and under-sampling to tackle the data imbalance problem. We compare our model with three representative classification alternatives. The experiment results show that our model achieves the best performance in terms of accuracy, demonstrating the efficacy of the proposed model.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
13
|
St Clair R, Teti M, Pavlovic M, Hahn W, Barenholtz E. Predicting residues involved in anti-DNA autoantibodies with limited neural networks. Med Biol Eng Comput 2022; 60:1279-1293. [PMID: 35303216 PMCID: PMC8932093 DOI: 10.1007/s11517-022-02539-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 01/10/2022] [Indexed: 11/30/2022]
Abstract
Abstract Computer-aided rational vaccine design (RVD) and synthetic pharmacology are rapidly developing fields that leverage existing datasets for developing compounds of interest. Computational proteomics utilizes algorithms and models to probe proteins for functional prediction. A potentially strong target for computational approach is autoimmune antibodies, which are the result of broken tolerance in the immune system where it cannot distinguish “self” from “non-self” resulting in attack of its own structures (proteins and DNA, mainly). The information on structure, function, and pathogenicity of autoantibodies may assist in engineering RVD against autoimmune diseases. Current computational approaches exploit large datasets curated with extensive domain knowledge, most of which include the need for many resources and have been applied indirectly to problems of interest for DNA, RNA, and monomer protein binding. We present a novel method for discovering potential binding sites. We employed long short-term memory (LSTM) models trained on FASTA primary sequences to predict protein binding in DNA-binding hydrolytic antibodies (abzymes). We also employed CNN models applied to the same dataset for comparison with LSTM. While the CNN model outperformed the LSTM on the primary task of binding prediction, analysis of internal model representations of both models showed that the LSTM models recovered sub-sequences that were strongly correlated with sites known to be involved in binding. These results demonstrate that analysis of internal processes of LSTM models may serve as a powerful tool for primary sequence analysis. Graphical abstract ![]()
Collapse
Affiliation(s)
- Rachel St Clair
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, USA.
| | - Michael Teti
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, USA
| | - Mirjana Pavlovic
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, USA
| | - William Hahn
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, USA
| | - Elan Barenholtz
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, USA
| |
Collapse
|
14
|
Xu J, Zhang J, Li J, Wang H, Chen J, Lyu H, Hu Q. Structural and Functional Trajectories of Middle Temporal Gyrus Sub-Regions During Life Span: A Potential Biomarker of Brain Development and Aging. Front Aging Neurosci 2022; 14:799260. [PMID: 35572140 PMCID: PMC9094684 DOI: 10.3389/fnagi.2022.799260] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 02/15/2022] [Indexed: 11/13/2022] Open
Abstract
Although previous studies identified a similar topography pattern of structural and functional delineations in human middle temporal gyrus (MTG) using healthy adults, trajectories of MTG sub-regions across lifespan remain largely unknown. Herein, we examined gray matter volume (GMV) and resting-state functional connectivity (RSFC) using datasets from the Nathan Kline Institute (NKI), and aimed to (1) investigate structural and functional trajectories of MTG sub-regions across the lifespan; and (2) assess whether these features can be used as biomarkers to predict individual’s chronological age. As a result, GMV of all MTG sub-regions followed U-shaped trajectories with extreme age around the sixth decade. The RSFC between MTG sub-regions and many cortical brain regions showed inversed U-shaped trajectories, whereas RSFC between MTG sub-regions and sub-cortical regions/cerebellum showed U-shaped way, with extreme age about 20 years earlier than those of GMV. Moreover, GMV and RSFC of MTG sub-regions could be served as useful features to predict individual age with high estimation accuracy. Together, these results not only provided novel insights into the dynamic process of structural and functional roles of MTG sub-regions across the lifespan, but also served as useful biomarkers to age prediction.
Collapse
Affiliation(s)
- Jinping Xu
- Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jinhuan Zhang
- Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jiaying Li
- Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haoyu Wang
- Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jianxiang Chen
- Department of Radiology, Shenzhen Traditional Chinese Medicine Hospital, The Fourth Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, China
| | - Hanqing Lyu
- Department of Radiology, Shenzhen Traditional Chinese Medicine Hospital, The Fourth Clinical Medical College, Guangzhou University of Chinese Medicine, Shenzhen, China
- *Correspondence: Hanqing Lyu,
| | - Qingmao Hu
- Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Qingmao Hu,
| |
Collapse
|
15
|
Energy-saving service management technology of internet of things using edge computing and deep learning. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00666-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractThe purpose is to solve the problems of high transmission rate and low delay in the deployment of mobile edge computing network, ensure the security and effectiveness of the Internet of things (IoT), and save resources. Dynamic power management is adopted to control the working state transition of Edge Data Center (EDC) servers. A load prediction model based on long-short term memory (LSTM) is creatively proposed. The innovation of the model is to shut down the server in idle state or low utilization in EDC, consider user mobility and EDC location information, learn the global optimal dynamic timeout threshold strategy and N-policy through trial and error reinforcement learning method, reasonably control the working state switching of the server, and realize load prediction and analysis. The results show that the performance of AdaGrad optimization solver is the best when the feature dimension is 3, the number of LSTM network layers is 6, the time series length is 30–45, the batch size is 128, the training time is 788 s, the number of units is 250, and the number of times is 350. Compared with the traditional methods, the proposed load prediction model and power management mechanism improve the prediction accuracy by 4.21%. Compared with autoregressive integrated moving average (ARIMA) load prediction, the dynamic power management method of LSTM load prediction can reduce energy consumption by 12.5% and realize the balance between EDC system performance and energy consumption. The system can effectively meet the requirements of multi-access edge computing (MEC) for low delay, high bandwidth and high reliability, reduce unnecessary energy consumption and waste, and reduce the cost of MEC service providers in actual operation. This exploration has important reference value for promoting the energy-saving development of Internet-related industries.
Collapse
|
16
|
Guo L, He J, Lin P, Huang SY, Wang J. TRScore: a three-dimensional RepVGG-based scoring method for ranking protein docking models. Bioinformatics 2022; 38:2444-2451. [PMID: 35199137 DOI: 10.1093/bioinformatics/btac120] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 01/19/2022] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPI) play important roles in cellular activities. Due to the technical difficulty and high cost of experimental methods, there are considerable interests towards the development of computational approaches, such as protein docking, to decipher PPI patterns. One of the important and difficult aspects in protein docking is recognizing near-native conformations from a set of decoys, but unfortunately traditional scoring functions still suffer from limited accuracy. Therefore, new scoring methods are pressingly needed in methodological and/or practical implications. RESULTS We present a new deep learning-based scoring method for ranking protein-protein docking models based on a three-dimensional (3D) RepVGG network, named TRScore. To recognize near-native conformations from a set of decoys, TRScore voxelizes the protein-protein interface into a 3D grid labeled by the number of atoms in different physicochemical classes. Benefiting from the deep convolutional RepVGG architecture, TRScore can effectively capture the subtle differences between energetically favorable near-native models and unfavorable non-native decoys without needing extra information. TRScore was extensively evaluated on diverse test sets including protein-protein docking benchmark 5.0 update set, DockGround decoy set, as well as realistic CAPRI decoy set, and overall obtained a significant improvement over existing methods in cross validation and independent evaluations. AVAILABILITY Codes available at: https://github.com/BioinformaticsCSU/TRScore.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jianxin Wang
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
17
|
Zhou X, Song H, Li J. Residue-Frustration-Based Prediction of Protein-Protein Interactions Using Machine Learning. J Phys Chem B 2022; 126:1719-1727. [PMID: 35170967 DOI: 10.1021/acs.jpcb.1c10525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The study of protein-protein interactions (PPIs) is important in understanding the function of proteins. However, it is still a challenge to investigate the transient protein-protein interaction by experiments. Hence, the computational prediction for protein-protein interactions draws growing attention. Statistics-based features have been widely used in the studies of protein structure prediction and protein folding. Due to the scarcity of experimental data of PPI, it is difficult to construct a conventional statistical feature for PPI prediction, and the application of statistics-based features is very limited in this field. In this paper, we explored the application of frustration, a statistical potential, in PPI prediction. By comparing the energetic contribution of the extra stabilization energy from a given residue pair in the native protein with the statistics of the energies, we obtained the residue pair's frustration index. By calculating the number of residue pairs with a high frustration index, the highly frustrated density, a residue-frustration-based feature, was then obtained to describe the tendency of residues to be involved in PPI. Highly frustrated density, as well as structure-based features, were then used to describe protein residues and combined with the long short-term memory (LSTM) neural network to predict PPI residue pairs. Our model correctly predicted 75% dimers when only the top 2‰ residue pairs were selected in each dimer. Our model, which considers the statistics-based features, is significantly different from the models based on the chemical features of residues. We found that frustration can effectively describe the tendency of residue to be involved in PPI. Frustration-based features can replace chemical features to combine with machine learning and realize the better performance of PPI prediction. It reveals the great potential of statistical potential such as frustration in PPI prediction.
Collapse
Affiliation(s)
- Xiaozhou Zhou
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoyu Song
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Jingyuan Li
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| |
Collapse
|
18
|
Yan Y, Huang SY. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Brief Bioinform 2021; 22:bbab038. [PMID: 33693482 PMCID: PMC8425427 DOI: 10.1093/bib/bbab038] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 01/09/2021] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intra-protein contact prediction, achieving an unusual high accuracy in recent Critical Assessment of protein Structure Prediction (CASP) structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo. Unlike previous deep learning approaches, we integrated intra-protein distance map and inter-protein docking pattern, in addition to evolutionary coupling, sequence conservation, and physico-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-Critical Assessment of Predicted Interaction (CAPRI) targets. It was shown that DeepHomo achieved a high precision of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis and machine learning-based approaches. Integrating predicted inter-chain contacts into protein-protein docking significantly improved the docking accuracy on the benchmark dataset of realistic homo-dimeric targets from CASP-CAPRI experiments. DeepHomo is available at http://huanglab.phys.hust.edu.cn/DeepHomo/.
Collapse
Affiliation(s)
- Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| |
Collapse
|
19
|
Hong Z, Liu J, Chen Y. An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction. Biophys Chem 2021; 278:106666. [PMID: 34418678 DOI: 10.1016/j.bpc.2021.106666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 08/09/2021] [Accepted: 08/09/2021] [Indexed: 12/29/2022]
Abstract
Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.
Collapse
Affiliation(s)
- Zhonghua Hong
- Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing University, Jiaxing 314001, PR China.
| | - Jiale Liu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, PR China
| | - Yinggao Chen
- Shantou Central Hospital, Shantou 515041, PR China.
| |
Collapse
|
20
|
Sunny S, Jayaraj PB. FPDock: Protein-protein docking using flower pollination algorithm. Comput Biol Chem 2021; 93:107518. [PMID: 34048986 DOI: 10.1016/j.compbiolchem.2021.107518] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/11/2021] [Accepted: 05/16/2021] [Indexed: 11/25/2022]
Abstract
Proteins play their vital role in biological systems through interaction and complex formation with other biological molecules. Indeed, abnormalities in the interaction patterns affect the proteins' structure and have detrimental effects on living organisms. Research in structure prediction gains its gravity as the functions of proteins depend on their structures. Protein-protein docking is one of the computational methods devised to understand the interaction between proteins. Metaheuristic algorithms are promising to use owing to the hardness of the structure prediction problem. In this paper, a variant of the Flower Pollination Algorithm (FPA) is applied to get an accurate protein-protein complex structure. The algorithm begins execution from a randomly generated initial population, which gets flourished in different isolated islands, trying to find their local optimum. The abiotic and biotic pollination applied in different generations brings diversity and intensity to the solutions. Each round of pollination applies an energy-based scoring function whose value influences the choice to accept a new solution. Analysis of final predictions based on CAPRI quality criteria shows that the proposed method has a success rate of 58% in top10 ranks, which in comparison with other methods like SwarmDock, pyDock, ZDOCK is better. Source code of the work is available at: https://github.com/Sharon1989Sunny/_FPDock_.
Collapse
Affiliation(s)
- Sharon Sunny
- Department of Computer Science and Engineering, National Institute of Technology Calicut, India.
| | - P B Jayaraj
- Department of Computer Science and Engineering, National Institute of Technology Calicut, India
| |
Collapse
|
21
|
Kang Q, Meng J, Shi W, Luan Y. Ensemble Deep Learning Based on Multi-level Information Enhancement and Greedy Fuzzy Decision for Plant miRNA-lncRNA Interaction Prediction. Interdiscip Sci 2021; 13:603-614. [PMID: 33900552 DOI: 10.1007/s12539-021-00434-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/01/2021] [Accepted: 04/16/2021] [Indexed: 12/18/2022]
Abstract
MicroRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are both non-coding RNAs (ncRNAs) and their interactions play important roles in biological processes. Computational methods, such as machine learning and various bioinformatics tools, can predict potential miRNA-lncRNA interactions, which is significant for studying their mechanisms and biological functions. A growing number of RNA interaction predictors for animal have been reported, but they are unreliable for plant due to the differences of ncRNAs in animal and plant. It is urgent to build a reliable plant predictor, especially for cross-species. This paper proposes an ensemble deep learning model based on multi-level information enhancement and greedy fuzzy decision (PmliPEMG) for plant miRNA-lncRNA interaction prediction. The fusion complex features, multi-scale convolutional long short-term memory networks, and attention mechanism are adopted to enhance the sample information at the feature, scale, and model levels, respectively. An ensemble deep learning model is built based on a novel method (greedy fuzzy decision) which greatly improves the efficiency. The multi-level information enhancement and greedy fuzzy decision are verified to have the positive effects on prediction performance. PmliPEMG can be applied to the cross-species prediction. It shows better performance and stronger generalization ability than state-of-the-art predictors and may provide valuable references for related research.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Wenhao Shi
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
22
|
Yoon D, Jang JH, Choi BJ, Kim TY, Han CH. Discovering hidden information in biosignals from patients using artificial intelligence. Korean J Anesthesiol 2020; 73:275-284. [PMID: 31955546 PMCID: PMC7403115 DOI: 10.4097/kja.19475] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 01/14/2020] [Indexed: 12/31/2022] Open
Abstract
Biosignals such as electrocardiogram or photoplethysmogram are widely used for determining and monitoring the medical condition of patients. It was recently discovered that more information could be gathered from biosignals by applying artificial intelligence (AI). At present, one of the most impactful advancements in AI is deep learning. Deep learning-based models can extract important features from raw data without feature engineering by humans, provided the amount of data is sufficient. This AI-enabled feature presents opportunities to obtain latent information that may be used as a digital biomarker for detecting or predicting a clinical outcome or event without further invasive evaluation. However, the black box model of deep learning is difficult to understand for clinicians familiar with a conventional method of analysis of biosignals. A basic knowledge of AI and machine learning is required for the clinicians to properly interpret the extracted information and to adopt it in clinical practice. This review covers the basics of AI and machine learning, and the feasibility of their application to real-life situations by clinicians in the near future.
Collapse
Affiliation(s)
- Dukyong Yoon
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea.,Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Korea
| | - Jong-Hwan Jang
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Byung Jin Choi
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Tae Young Kim
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Chang Ho Han
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| |
Collapse
|