1
|
Martín M, Bolognesi B. Massive mutagenesis reveals an incomplete amyloid motif in Bri2 that turns amyloidogenic upon C-terminal extension. Proc Natl Acad Sci U S A 2025; 122:e2415521122. [PMID: 40314981 PMCID: PMC12067230 DOI: 10.1073/pnas.2415521122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 03/20/2025] [Indexed: 05/03/2025] Open
Abstract
Stop-loss mutations cause over twenty different diseases. The effects of stop-loss mutations can have multiple consequences that are, however, hard to predict. Stop-loss in ITM2B/BRI2 results in C-terminal extension of the encoded protein and, upon furin cleavage, in the production of two 34 amino acid long peptides, ADan and ABri, that accumulate as amyloids in the brains of patients affected by familial Danish and British Dementia. To systematically explore the consequences of Bri2 C-terminal extension, here, we use a yeast-based massively parallel assay to measure amyloid formation for 676 ADan substitutions and identify the region that forms the putative amyloid core of ADan fibrils, located between positions 20 and 26, where stop-loss occurs. Moreover, we measure amyloid formation for ~18,000 random C-terminal extensions of Bri2 and find that ~32% of these sequences can nucleate amyloids. We find that the amino acid composition of these nucleating sequences varies with peptide length and that short extensions of two specific amino acids (Aliphatics, Aromatics, and Cysteines) are sufficient to generate de novo amyloid cores. Overall, our results show that the C-terminus of Bri2 contains an incomplete amyloid motif that can turn amyloidogenic upon extension. C-terminal extension with de novo formation of amyloid motifs may thus be a widespread pathogenic mechanism resulting from stop-loss, highlighting the importance of determining the impact of these mutations for other sequences across the genome.
Collapse
Affiliation(s)
- Mariano Martín
- Institute for Bioengineering of Catalonia, The Barcelona Institute of Science and Technology, Barcelona08028, Spain
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia, The Barcelona Institute of Science and Technology, Barcelona08028, Spain
| |
Collapse
|
2
|
Thompson M, Martín M, Olmo TS, Rajesh C, Koo PK, Bolognesi B, Lehner B. Massive experimental quantification allows interpretable deep learning of protein aggregation. SCIENCE ADVANCES 2025; 11:eadt5111. [PMID: 40305601 PMCID: PMC12042874 DOI: 10.1126/sciadv.adt5111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Accepted: 03/26/2025] [Indexed: 05/02/2025]
Abstract
Protein aggregation is a pathological hallmark of more than 50 human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the aggregation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts aggregation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict aggregation.
Collapse
Affiliation(s)
- Mike Thompson
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Mariano Martín
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Trinidad Sanmartín Olmo
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona 08002, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| |
Collapse
|
3
|
Schaduangrat N, Chuntakaruk H, Rungrotmongkol T, Mookdarsanit P, Shoombuatong W. M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy. BMC Bioinformatics 2025; 26:117. [PMID: 40307679 PMCID: PMC12044944 DOI: 10.1186/s12859-025-06132-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Accepted: 04/03/2025] [Indexed: 05/02/2025] Open
Abstract
Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing's syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Faculty of Medical Technology, Center for Research Innovation and Biomedical Informatics, Mahidol University, Bangkok, 10700, Thailand
| | - Hathaichanok Chuntakaruk
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, 10330, Thailand
- Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, 10330, Thailand
- Faculty of Medicine, Center for Artificial Intelligence in Medicine, Chulalongkorn University, Bangkok, Bangkok, 10330, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, 10330, Thailand
- Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Pakpoom Mookdarsanit
- Faculty of Science, Computer Science and Artificial Intelligence, Chandrakasem Rajabhat University, Bangkok, 10900, Thailand
| | - Watshara Shoombuatong
- Faculty of Medical Technology, Center for Research Innovation and Biomedical Informatics, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
4
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides. J Mol Biol 2025; 437:168853. [PMID: 39510347 DOI: 10.1016/j.jmb.2024.168853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 10/22/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024]
Abstract
AVPs, or antiviral peptides, are short chains of amino acids capable of inhibiting viral replication, preventing viral entry, or disrupting viral membranes. They represent a promising area of research for developing new antiviral therapies due to their potential to target a broad spectrum of viruses, incorporating those resistant to traditional antiviral drugs. However, traditional experimental methods for identifying AVPs are often costly and labour-intensive. Thus far, multiple computational methods have been introduced for the in silico identification of AVPs, but these methods still have certain shortcomings. In this study, we propose a novel stacked ensemble learning framework, termed Stack-AVP, for fast and accurate AVP identification. In Stack-AVP, we investigated heterogeneous prediction models, which were trained with 12 commonly used machine learning algorithms coupled with a wide range of multiple feature encoding schemes. Subsequently, these prediction models were adopted to generate multi-view features providing class information and probability information. Finally, we applied our feature selection method to determine the best feature subset for the construction of the final stacked model. Comparative assessments on the independent test dataset revealed that Stack-AVP surpassed the performance of current state-of-the-art methods, with an accuracy of 0.930, MCC of 0.860, and AUC of 0.975. Furthermore, it was found that our multi-view features exhibited a crucial mechanism to improve the prediction performance of AVPs. To facilitate experimental scientists in performing high-throughput identification of AVPs, the prediction sever Stack-AVP is publicly accessible at https://pmlabqsar.pythonanywhere.com/Stack-AVP.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
5
|
Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P. Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy. Sci Rep 2025; 15:4762. [PMID: 39922825 PMCID: PMC11807091 DOI: 10.1038/s41598-024-81807-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 11/29/2024] [Indexed: 02/10/2025] Open
Abstract
Tyrosinase plays a crucial role as an enzyme in the production of melanin, which is the pigment accountable for determining the color of the hair, eyes, and skin. Tyrosinase inhibitory peptides (TIPs), mainly designed to regulate the activity of the enzyme tyrosinase, are of interest in various domains, including cosmetics, dermatology, and pharmaceuticals, due to their potential applications in controlling skin pigmentation. To date, a few machine learning-based models have been proposed for predicting TIPs, but their predictive performance remains unsatisfactory. In this study, we propose an innovative computational approach, named TIPred-MVFF, to accurately predict TIPs using only sequence information. Firstly, we established an up-to-date and high-quality dataset by collecting samples from various sources. Secondly, we applied a multi-view feature fusion (MVFF) strategy to extract and explore probability and category information embedded in TIPs, employing several machine learning (ML) algorithms coupled with different commonly used sequence-based feature encodings. Then, we employed resampling approaches to address the class imbalance issue. Finally, to maximize the utility of each feature, we fused probability-based and sequence-based features, generating more informative feature that were used to develop the final prediction model. Based on the independent test, experimental results showed that TIPred-MVFF outperformed several conventional ML classifiers and existing methods in terms of prediction accuracy and robustness, achieving an accuracy of 0.937 and a Matthew's correlation coefficient of 0.847. This new computational approach is anticipated to aid community-wide efforts in rapidly and cost-effectively discovering novel peptides with strong tyrosinase inhibitory activities.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
- Department of Computer Science, University of Swabi, Swabi, 23561, Pakistan
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Kasetsart University International College (KUIC), Kasetsart University, Bangkok, 10900, Thailand.
| |
Collapse
|
6
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides. Methods 2025; 234:131-140. [PMID: 39709069 DOI: 10.1016/j.ymeth.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 11/27/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
Identifying angiotensin-I-converting enzyme (ACE) inhibitory peptides accurately is crucial for understanding the primary factor that regulates the renin-angiotensin system and for providing guidance in developing new potential drugs. Given the inherent experimental complexities, using computational methods for in silico peptide identification could be indispensable for facilitating the high-throughput characterization of ACE inhibitory peptides. In this paper, we propose a novel deep stacking-based ensemble learning framework, termed Deepstack-ACE, to precisely identify ACE inhibitory peptides. In Deepstack-ACE, the input peptide sequences are fed into the word2vec embedding technique to generate sequence representations. Then, these representations were employed to train five powerful deep learning methods, including long short-term memory, convolutional neural network, multi-layer perceptron, gated recurrent unit network, and recurrent neural network, for the construction of base-classifiers. Finally, the optimized stacked model was constructed based on the best combination of selected base-classifiers. Benchmarking experiments showed that Deepstack-ACE attained a more accurate and robust identification of ACE inhibitory peptides compared to its base-classifiers and several conventional machine learning classifiers. Remarkably, in the independent test, our proposed model significantly outperformed the current state-of-the-art methods, with a balanced accuracy of 0.916, sensitivity of 0.911, and Matthews correlation coefficient scores of 0.826. Moreover, we developed a user-friendly web server for Deepstack-ACE, which is freely available at https://pmlabqsar.pythonanywhere.com/Deepstack-ACE. We anticipate that our proposed Deepstack-ACE model can provide a faster and reasonably accurate identification of ACE inhibitory peptides.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
7
|
Schaduangrat N, Khemawoot P, Jiso A, Charoenkwan P, Shoombuatong W. MetaCGRP is a high-precision meta-model for large-scale identification of CGRP inhibitors using multi-view information. Sci Rep 2024; 14:24764. [PMID: 39433940 PMCID: PMC11494111 DOI: 10.1038/s41598-024-75487-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 10/07/2024] [Indexed: 10/23/2024] Open
Abstract
Migraine is considered one of the debilitating primary headache conditions with an estimated worldwide occurrence of approximately 14-15%, contributing highly to factors responsible for global disability. Calcitonin gene-related peptide (CGRP) is a neuropeptide that plays a crucial role in the pathophysiology of migraines and thus, its inhibition can help relieve migraine symptoms. However, conventional process of CGRP drug development has been laborious and time-consuming with incurred costs exceeding one billion dollars. On the other hand, machine learning (ML)-based approaches that are capable of accurately identifying CGRP inhibitors could greatly facilitate in expediting the discovery of novel CGRP drugs. Therefore, this study proposes a novel and high-accuracy meta-model, namely MetaCGRP, that can precisely identify CGRP inhibitors. To the best of our knowledge, MetaCGRP is the first SMILES-based approach that has been developed to identify CGRP inhibitors without the use of 3D structural information. In brief, we initially employed different molecular representation methods coupled with popular ML algorithms to construct a pool of baseline models. Then, all baseline models were optimized and used to generate multi-view features. Finally, we employed the feature selection method to optimize the multi-view features and determine the best feature subset to enable the construction of the meta-model. Both cross-validation and independent tests indicated that MetaCGRP clearly outperforms several conventional ML classifiers, with accuracies of 0.898 and 0.799 on the training and independent test datasets, respectively. In addition, MetaCGRP in conjunction with molecular docking was utilized to identify five potential natural product candidates from Thai herbal pharmacopoeia and analyze their binding affinity and interactions to CGRP. To facilitate community-wide efforts in expediting the discovery of novel CGRP inhibitors, a user-friendly web server for MetaCGRP is freely available at https://pmlabqsar.pythonanywhere.com/MetaCGRP .
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phisit Khemawoot
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan, 10540, Thailand
| | - Apisada Jiso
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan, 10540, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
8
|
Kell DB, Pretorius E. Proteomic Evidence for Amyloidogenic Cross-Seeding in Fibrinaloid Microclots. Int J Mol Sci 2024; 25:10809. [PMID: 39409138 PMCID: PMC11476703 DOI: 10.3390/ijms251910809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 10/01/2024] [Accepted: 10/03/2024] [Indexed: 10/20/2024] Open
Abstract
In classical amyloidoses, amyloid fibres form through the nucleation and accretion of protein monomers, with protofibrils and fibrils exhibiting a cross-β motif of parallel or antiparallel β-sheets oriented perpendicular to the fibre direction. These protofibrils and fibrils can intertwine to form mature amyloid fibres. Similar phenomena can occur in blood from individuals with circulating inflammatory molecules (and also some originating from viruses and bacteria). Such pathological clotting can result in an anomalous amyloid form termed fibrinaloid microclots. Previous proteomic analyses of these microclots have shown the presence of non-fibrin(ogen) proteins, suggesting a more complex mechanism than simple entrapment. We thus provide evidence against such a simple entrapment model, noting that clot pores are too large and centrifugation would have removed weakly bound proteins. Instead, we explore whether co-aggregation into amyloid fibres may involve axial (multiple proteins within the same fibril), lateral (single-protein fibrils contributing to a fibre), or both types of integration. Our analysis of proteomic data from fibrinaloid microclots in different diseases shows no significant quantitative overlap with the normal plasma proteome and no correlation between plasma protein abundance and their presence in fibrinaloid microclots. Notably, abundant plasma proteins like α-2-macroglobulin, fibronectin, and transthyretin are absent from microclots, while less abundant proteins such as adiponectin, periostin, and von Willebrand factor are well represented. Using bioinformatic tools, including AmyloGram and AnuPP, we found that proteins entrapped in fibrinaloid microclots exhibit high amyloidogenic tendencies, suggesting their integration as cross-β elements into amyloid structures. This integration likely contributes to the microclots' resistance to proteolysis. Our findings underscore the role of cross-seeding in fibrinaloid microclot formation and highlight the need for further investigation into their structural properties and implications in thrombotic and amyloid diseases. These insights provide a foundation for developing novel diagnostic and therapeutic strategies targeting amyloidogenic cross-seeding in blood clotting disorders.
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
- The Novo Nordisk Foundation Centre for Biosustainability, Building 220, Søltofts Plads 200, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
- Department of Physiological Sciences, Faculty of Science, Stellenbosch University, Private Bag X1 Matieland, Stellenbosch 7602, South Africa
| | - Etheresia Pretorius
- Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
- Department of Physiological Sciences, Faculty of Science, Stellenbosch University, Private Bag X1 Matieland, Stellenbosch 7602, South Africa
| |
Collapse
|
9
|
Thompson M, Martín M, Olmo TS, Rajesh C, Koo PK, Bolognesi B, Lehner B. Massive experimental quantification of amyloid nucleation allows interpretable deep learning of protein aggregation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.13.603366. [PMID: 39071305 PMCID: PMC11275847 DOI: 10.1101/2024.07.13.603366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Protein aggregation is a pathological hallmark of more than fifty human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the amyloid nucleation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts amyloid nucleation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict amyloid nucleation.
Collapse
Affiliation(s)
- Mike Thompson
- Systems and Synthetic Biology, Centre for Genomic Regulation, The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
| | - Mariano Martín
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Trinidad Sanmartín Olmo
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Systems and Synthetic Biology, Centre for Genomic Regulation, The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- University Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
10
|
Shoombuatong W, Meewan I, Mookdarsanit L, Schaduangrat N. Stack-HDAC3i: A high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework. Methods 2024; 230:147-157. [PMID: 39191338 DOI: 10.1016/j.ymeth.2024.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/07/2024] [Accepted: 08/17/2024] [Indexed: 08/29/2024] Open
Abstract
Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew's correlation coefficient of 0.850, which are 0.44-6.11% and 0.83-11.90% higher than its constituent baseline models, respectively.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Ittipat Meewan
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom 73170, Thailand
| | - Lawankorn Mookdarsanit
- Business Information System, Faculty of Management Science, Chandrakasem Rajabhat University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
11
|
Evans M, Hagan R, Boyd OJ, Bondetti M, Craig OE, Collins MJ, Hendy J. The impact of cooking and burial on proteins: a characterisation of experimental foodcrusts and ceramics. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240610. [PMID: 39416716 PMCID: PMC11482021 DOI: 10.1098/rsos.240610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 07/31/2024] [Accepted: 08/01/2024] [Indexed: 10/19/2024]
Abstract
Foodcrusts have received relatively little attention in the burgeoning field of proteomic analysis of ancient cuisine. We remain ignorant of how cooking and burial impact protein survival, and crucially, the extent to which the extractome reflects the composition of input ingredients. Therefore, through experimental analogues, we explore the extent of protein survival in unburied and buried foodcrusts and ceramics using 'typical' Mesolithic ingredients (red deer, Atlantic salmon and sweet chestnut). We then explore a number of physicochemical properties theorised to aid protein preservation. The results reveal that proteins were much more likely to be detected in foodcrusts than ceramics using the methodology employed, that input ingredient strongly influences protein preservation, and that degradation is not universal nor linear between proteins, indicating that multiple protein physicochemical properties are at play. While certain properties such as hydrophobicity apparently aid protein preservation, none single-handedly explain why particular proteins/peptides survive in buried foodcrusts: this complex interplay requires further investigation. The findings demonstrate that proteins indicative of the input ingredient can be identifiable in foodcrust, but that the full proteome is unlikely to preserve. While this shows promise for the survival of proteins in archaeological foodcrust, further research is needed to accurately interpret foodcrust extractomes.
Collapse
Affiliation(s)
- Miranda Evans
- McDonald Institute for Archaeological Research, University of Cambridge, CambridgeCB2 3ER, UK
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Richard Hagan
- BioArCh, Department of Archaeology, University of York, York, UK
| | | | - Manon Bondetti
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Oliver E. Craig
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Matthew J. Collins
- McDonald Institute for Archaeological Research, University of Cambridge, CambridgeCB2 3ER, UK
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jessica Hendy
- BioArCh, Department of Archaeology, University of York, York, UK
| |
Collapse
|
12
|
Rukh G, Akbar S, Rehman G, Alarfaj FK, Zou Q. StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning. BMC Bioinformatics 2024; 25:256. [PMID: 39098908 PMCID: PMC11298090 DOI: 10.1186/s12859-024-05884-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 07/29/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.
Collapse
Affiliation(s)
- Gul Rukh
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Gauhar Rehman
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), 31982, Al-Ahsa, Saudi Arabia
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
13
|
Bárcenas O, Kuriata A, Zalewski M, Iglesias V, Pintado-Grima C, Firlik G, Burdukiewicz M, Kmiecik S, Ventura S. Aggrescan4D: structure-informed analysis of pH-dependent protein aggregation. Nucleic Acids Res 2024; 52:W170-W175. [PMID: 38738618 PMCID: PMC11223845 DOI: 10.1093/nar/gkae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/11/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024] Open
Abstract
Protein aggregation is behind the genesis of incurable diseases and imposes constraints on drug discovery and the industrial production and formulation of proteins. Over the years, we have been advancing the Aggresscan3D (A3D) method, aiming to deepen our comprehension of protein aggregation and assist the engineering of protein solubility. Since its inception, A3D has become one of the most popular structure-based aggregation predictors because of its performance, modular functionalities, RESTful service for extensive screenings, and intuitive user interface. Building on this foundation, we introduce Aggrescan4D (A4D), significantly extending A3D's functionality. A4D is aimed at predicting the pH-dependent aggregation of protein structures, and features an evolutionary-informed automatic mutation protocol to engineer protein solubility without compromising structure and stability. It also integrates precalculated results for the nearly 500,000 jobs in the A3D Model Organisms Database and structure retrieval from the AlphaFold database. Globally, A4D constitutes a comprehensive tool for understanding, predicting, and designing solutions for specific protein aggregation challenges. The A4D web server and extensive documentation are available at https://biocomp.chem.uw.edu.pl/a4d/. This website is free and open to all users without a login requirement.
Collapse
Affiliation(s)
- Oriol Bárcenas
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Aleksander Kuriata
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Mateusz Zalewski
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Valentín Iglesias
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Clinical Research Centre, Medical University of Białystok, Kilińskiego 1, 15-369 Białystok, Poland
| | - Carlos Pintado-Grima
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Grzegorz Firlik
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michał Burdukiewicz
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Clinical Research Centre, Medical University of Białystok, Kilińskiego 1, 15-369 Białystok, Poland
| | - Sebastian Kmiecik
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Salvador Ventura
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
14
|
Arif R, Kanwal S, Ahmed S, Kabir M. A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features. Interdiscip Sci 2024; 16:503-518. [PMID: 38733473 DOI: 10.1007/s12539-024-00628-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/16/2024] [Accepted: 03/27/2024] [Indexed: 05/13/2024]
Abstract
Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.
Collapse
Affiliation(s)
- Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan.
| |
Collapse
|
15
|
Kumar S, Davis RM, Ruiz N. YdbH and YnbE form an intermembrane bridge to maintain lipid homeostasis in the outer membrane of Escherichia coli. Proc Natl Acad Sci U S A 2024; 121:e2321512121. [PMID: 38748582 PMCID: PMC11126948 DOI: 10.1073/pnas.2321512121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 04/09/2024] [Indexed: 05/27/2024] Open
Abstract
The outer membrane (OM) of didermic gram-negative bacteria is essential for growth, maintenance of cellular integrity, and innate resistance to many antimicrobials. Its asymmetric lipid distribution, with phospholipids in the inner leaflet and lipopolysaccharides (LPS) in the outer leaflet, is required for these functions. Lpt proteins form a transenvelope bridge that transports newly synthesized LPS from the inner membrane (IM) to OM, but how the bulk of phospholipids are transported between these membranes is poorly understood. Recently, three members of the AsmA-like protein family, TamB, YhdP, and YdbH, were shown to be functionally redundant and were proposed to transport phospholipids between IM and OM in Escherichia coli. These proteins belong to the repeating β-groove superfamily, which includes eukaryotic lipid-transfer proteins that mediate phospholipid transport between organelles at contact sites. Here, we show that the IM-anchored YdbH protein interacts with the OM lipoprotein YnbE to form a functional protein bridge between the IM and OM in E. coli. Based on AlphaFold-Multimer predictions, genetic data, and in vivo site-directed cross-linking, we propose that YnbE interacts with YdbH through β-strand augmentation to extend the continuous hydrophobic β-groove of YdbH that is thought to shield acyl chains of phospholipids as they travel through the aqueous intermembrane periplasmic compartment. Our data also suggest that the periplasmic protein YdbL prevents extensive amyloid-like multimerization of YnbE in cells. We, therefore, propose that YdbL has a chaperone-like function that prevents uncontrolled runaway multimerization of YnbE to ensure the proper formation of the YdbH-YnbE intermembrane bridge.
Collapse
Affiliation(s)
- Sujeet Kumar
- Department of Microbiology, The Ohio State University, Columbus, OH43210
| | - Rebecca M. Davis
- Department of Microbiology, The Ohio State University, Columbus, OH43210
| | - Natividad Ruiz
- Department of Microbiology, The Ohio State University, Columbus, OH43210
| |
Collapse
|
16
|
Abbasi AF, Asim MN, Ahmed S, Dengel A. Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns. Sci Rep 2024; 14:9466. [PMID: 38658614 PMCID: PMC11043385 DOI: 10.1038/s41598-024-57457-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| |
Collapse
|
17
|
Behera DP, Subadini S, Freudenberg U, Sahoo H. Sulfation of hyaluronic acid reconfigures the mechanistic pathway of bone morphogenetic protein-2 aggregation. Int J Biol Macromol 2024; 263:130128. [PMID: 38350587 DOI: 10.1016/j.ijbiomac.2024.130128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 02/03/2024] [Accepted: 02/09/2024] [Indexed: 02/15/2024]
Abstract
Bone morphogenetic protein-2 (BMP-2) is a critical growth factor of bone extracellular matrix (ECM), pivotal for osteogenesis. Glycosaminoglycans (GAGs), another vital ECM biomolecules, interact with growth factors, affecting signal transduction. Our study primarily focused on hyaluronic acid (HA), a prevalent GAG, and its sulfated derivative (SHA). We explored their impact on BMP-2's conformation, aggregation, and mechanistic pathways of aggregation using diverse optical and rheological methods. In the presence of HA and SHA, the secondary structure of BMP-2 underwent a structured transformation, characterized by a substantial increase in beta sheet content, and a detrimental alteration, manifesting as a shift towards unstructured content, respectively. Although both HA and SHA induced BMP-2 aggregation, their mechanisms differed. SHA led to rapid amorphous aggregates, while HA promoted amyloid fibrils with a lag phase and sigmoidal kinetics. Aggregate size and shape varied; HA produced larger structures, SHA smaller. Each aggregation type followed distinct pathways influenced by viscosity and excluded volume. Higher viscosity, low diffusivity of protein and higher excluded volume In the presence of HA promotes fibrillation having size in micrometer range. Low viscosity, high diffusivity of protein and lesser excluded volume leads to amorphous aggregate of size in nanometer range.
Collapse
Affiliation(s)
- Devi Prasanna Behera
- Biophysical and Protein Chemistry Lab, Department of Chemistry, National Institute of Technology, Rourkela 769008, Odisha, India
| | - Suchismita Subadini
- Biophysical and Protein Chemistry Lab, Department of Chemistry, National Institute of Technology, Rourkela 769008, Odisha, India
| | - Uwe Freudenberg
- Institute of Polymer Research, Technical University Dresden, 01307 Dresden, Germany
| | - Harekrushna Sahoo
- Biophysical and Protein Chemistry Lab, Department of Chemistry, National Institute of Technology, Rourkela 769008, Odisha, India; Center for Nanomaterials, National Institute of Technology, Rourkela 769008, Odisha, India.
| |
Collapse
|
18
|
Kanwal S, Arif R, Ahmed S, Kabir M. A novel stacking-based predictor for accurate prediction of antimicrobial peptides. J Biomol Struct Dyn 2024:1-12. [PMID: 38500243 DOI: 10.1080/07391102.2024.2329298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/06/2024] [Indexed: 03/20/2024]
Abstract
Antimicrobial peptides (AMPs) are gaining acceptance and support as a chief antibiotic substitute since they boost human immunity. They retain a wide range of actions and have a low risk of developing resistance, which are critical properties to the pharmaceutical industry for drug discovery. Antibiotic sensitivity, however, is an issue that affects people all around the world and has the potential to one day lead to an epidemic. As cutting-edge therapeutic agents, AMPs are also expected to cure microbial infections. In order to produce tolerable drugs, it is crucial to understand the significance of the basic architecture of AMPs. Traditional laboratory methods are expensive and time-consuming for AMPs testing and detection. Currently, bioinformatics techniques are being successfully applied to the detection of AMPs. In this study, we have developed a novel STacking-based ensemble learning framework for AntiMicrobial Peptide (STAMP) prediction. First, we constructed 84 different baseline models by using 12 different feature encoding schemes and 7 popular machine learning algorithms. Second, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, based on the feature selection strategy, we determined the optimal probabilistic feature vector, which was further utilized for the construction of our stacked model. Resultantly, the STAMP predictor achieved excellent performance during cross-validation with an accuracy and Matthew's correlation coefficient of 0.930 and 0.860, respectively. The corresponding metrics during the independent test were 0.710 and 0.464, respectively. Overall, STAMP achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, STAMP is expected to assist community-wide efforts in identifying AMPs and will contribute to the development of novel therapeutic methods and drug-design for immunity.
Collapse
Affiliation(s)
- Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
19
|
Khalid M, Ali F, Alghamdi W, Alzahrani A, Alsini R, Alzahrani A. An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform. J Biomol Struct Dyn 2024:1-9. [PMID: 38498362 DOI: 10.1080/07391102.2024.2329777] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024]
Abstract
Clathrin protein (CP) plays a pivotal role in numerous cellular processes, including endocytosis, signal transduction, and neuronal function. Dysregulation of CP has been associated with a spectrum of diseases. Given its involvement in various cellular functions, CP has garnered significant attention for its potential applications in drug design and medicine, ranging from targeted drug delivery to addressing viral infections, neurological disorders, and cancer. The accurate identification of CP is crucial for unraveling its function and devising novel therapeutic strategies. Computational methods offer a rapid, cost-effective, and less labor-intensive alternative to traditional identification methods, making them especially appealing for high-throughput screening. This paper introduces CL-Pred, a novel computational method for CP identification. CL-Pred leverages three feature descriptors: Dipeptide Deviation from Expected Mean (DDE), Bigram Position Specific Scoring Matrix (BiPSSM), and Position Specific Scoring Matrix-Tetra Slice-Discrete Cosine Transform (PSSM-TS-DCT). The model is trained using three classifiers: Support Vector Machine (SVM), Extremely Randomized Tree (ERT), and Light eXtreme Gradient Boosting (LiXGB). Notably, the LiXGB-based model achieves outstanding performance, demonstrating accuracies of 94.63% and 93.65% on the training and testing datasets, respectively. The proposed CL-Pred method is poised to significantly advance our comprehension of clathrin-mediated endocytosis, cellular physiology, and disease pathogenesis. Furthermore, it holds promise for identifying potential drug targets across a spectrum of diseases.
Collapse
Affiliation(s)
- Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Mardan, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Alzahrani
- College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
20
|
Yang Z, Wu Y, Liu H, He L, Deng X. AMYGNN: A Graph Convolutional Neural Network-Based Approach for Predicting Amyloid Formation from Polypeptides. J Chem Inf Model 2024; 64:1751-1762. [PMID: 38408296 DOI: 10.1021/acs.jcim.3c02035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
There has been an increasing interest in the use of amyloids for constructing various functional materials. The design of amyloid-associated functional materials requires the identification of the core peptide sequences as the fundamental building block. The existing computational methods are limited in terms of delineating polypeptides, the typical non-Euclidean structural data, and they fail to capture the dynamic interactions between amino acids due to ignoring the contextual information from surrounding amino acids. Here, we first propose the use of a state-of-the-art graph convolutional neural network for predicting the trends of amyloid formation from specific peptide sequences (AMYGNN) by abstracting each polypeptide as a graph, in which the constituting amino acids are viewed as nodes and edges characterizing the connections between pairs of amino acids are established when they meet a given distance threshold (Cα-Cα ≤ 5 Å). Our model achieves high performance with accuracy (0.9208), G-mean (0.9203), MCC (0.8417), and F1 (0.9235) in determining the characteristic peptide sequences to form amyloid. 32 of 534 crucial amino acid properties that greatly contribute to the formation of amyloids are ascertained, and the β-folding-like graph structure of a polypeptide is believed to be essential for the formation of amyloid. Our model enables the mapping of polypeptides with underlying interactions between amino acids and provides a quick and precise predictive framework for directing the construction of amyloid-associated functional materials.
Collapse
Affiliation(s)
- Zuojun Yang
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Yuhan Wu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Hao Liu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Li He
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Xiaoyuan Deng
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| |
Collapse
|
21
|
Gullulu O, Ozcelik E, Tuzlakoglu Ozturk M, Karagoz MS, Tazebay UH. A multi-faceted approach to unravel coding and non-coding gene fusions and target chimeric proteins in ataxia. J Biomol Struct Dyn 2024:1-21. [PMID: 38411012 DOI: 10.1080/07391102.2024.2321510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/15/2024] [Indexed: 02/28/2024]
Abstract
Ataxia represents a heterogeneous group of neurodegenerative disorders characterized by a loss of balance and coordination, often resulting from mutations in genes vital for cerebellar function and maintenance. Recent advances in genomics have identified gene fusion events as critical contributors to various cancers and neurodegenerative diseases. However, their role in ataxia pathogenesis remains largely unexplored. Our study Hdelved into this possibility by analyzing RNA sequencing data from 1443 diverse samples, including cell and mouse models, patient samples, and healthy controls. We identified 7067 novel gene fusions, potentially pivotal in disease onset. These fusions, notably in-frame, could produce chimeric proteins, disrupt gene regulation, or introduce new functions. We observed conservation of specific amino acids at fusion breakpoints and identified potential aggregate formations in fusion proteins, known to contribute to ataxia. Through AI-based protein structure prediction, we identified topological changes in three high-confidence fusion proteins-TEN1-ACOX1, PEX14-NMNAT1, and ITPR1-GRID2-which could potentially alter their functions. Subsequent virtual drug screening identified several molecules and peptides with high-affinity binding to fusion sites. Molecular dynamics simulations confirmed the stability of these protein-ligand complexes at fusion breakpoints. Additionally, we explored the role of non-coding RNA fusions as miRNA sponges. One such fusion, RP11-547P4-FLJ33910, showed strong interaction with hsa-miR-504-5p, potentially acting as its sponge. This interaction correlated with the upregulation of hsa-miR-504-5p target genes, some previously linked to ataxia. In conclusion, our study unveils new aspects of gene fusions in ataxia, suggesting their significant role in pathogenesis and opening avenues for targeted therapeutic interventions.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omer Gullulu
- Department of Structural Biology, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Emrah Ozcelik
- Department of Molecular Biology and Genetics, Gebze Technical University, Gebze, Kocaeli, Turkey
- Central Research Laboratory (GTU-MAR), Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Merve Tuzlakoglu Ozturk
- Department of Molecular Biology and Genetics, Gebze Technical University, Gebze, Kocaeli, Turkey
- Central Research Laboratory (GTU-MAR), Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Mustafa Safa Karagoz
- Institut für Mikrobiologie, Technische Universität Braunschweig, Braunschweig, Germany
- Biochemistry and Biophysics Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Uygar Halis Tazebay
- Department of Molecular Biology and Genetics, Gebze Technical University, Gebze, Kocaeli, Turkey
- Central Research Laboratory (GTU-MAR), Gebze Technical University, Gebze, Kocaeli, Turkey
| |
Collapse
|
22
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
23
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework. J Biomol Struct Dyn 2024:1-13. [PMID: 38385478 DOI: 10.1080/07391102.2024.2318482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/08/2024] [Indexed: 02/23/2024]
Abstract
Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew's correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
24
|
Zöller B, Manderstedt E, Lind-Halldén C, Halldén C. Rare-variant collapsing and bioinformatic analyses for different types of cardiac arrhythmias in the UK Biobank reveal novel susceptibility loci and candidate amyloid-forming proteins. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2024; 5:15-18. [PMID: 38390584 PMCID: PMC10879009 DOI: 10.1016/j.cvdhj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2024] Open
Abstract
Background Cardiac arrhythmias are a common health problem. Both common and rare genetic risk factors exist for cardiac arrhythmias. Cardiac amyloidosis is a rare disease that may manifest various arrhythmias. Few large-scale whole exome sequencing studies elucidating the contribution of rare variations to arrhythmias have been published. Objective To access gene collapsing analysis of rare variations for different types of cardiac arrhythmias in UK Biobank. Identified genes were analyzed in silico for probability to form amyloid fibrils. Methods We used 2 published UK Biobank portals (https://azphewas.com/ and https://app.genebass.org/) to access gene collapsing analysis of rare variations for different types of cardiac arrhythmias. Diagnosis of arrhythmia was based on the International Classification of Diseases, 10th Revision (ICD-10) codes: conduction disorders (I44, I45), paroxysmal tachycardia (I47), atrial fibrillation (I48), and other arrhythmias (I49). Results Rare variations in 5 genes were linked to conduction disorders (SCN5A, LMNA, SMAD6, HSPB9, TMEM95). The TTN gene was associated with both paroxysmal tachycardia and other arrhythmias. Atrial fibrillation was associated with rare variations in 8 genes (TTN, RPL3L, KLF1, TET2, NME3, KDM5B, PKP2, PMVK). Two of the genes linked to heart conduction disorders were potential amyloid-forming proteins (HSPB9, TMEM95), while none of the 8 genes linked to other types of arrhythmias were potential amyloid-forming proteins. Conclusion Rare variations in 13 genes were associated with arrhythmias in the UK Biobank. Two of the heart conduction disorder-linked genes are potential amyloid-forming candidates. Amyloid formation may be an underestimated cause of heart conduction disorders.
Collapse
Affiliation(s)
- Bengt Zöller
- Center for Primary Health Care Research, Department of Clinical Sciences, Lund University and Region Skåne, Malmö, Sweden
| | - Eric Manderstedt
- Center for Primary Health Care Research, Department of Clinical Sciences, Lund University and Region Skåne, Malmö, Sweden
| | | | - Christer Halldén
- Center for Primary Health Care Research, Department of Clinical Sciences, Lund University and Region Skåne, Malmö, Sweden
| |
Collapse
|
25
|
Fehér E, Kaszab E, Mótyán JA, Máté D, Bali K, Hoitsy M, Sós E, Jakab F, Bányai K. Structural similarity of human papillomavirus E4 and polyomaviral VP4 exhibited by genomic analysis of the common kestrel (Falco tinnunculus) polyomavirus. Vet Res Commun 2024; 48:309-315. [PMID: 37688754 PMCID: PMC10810995 DOI: 10.1007/s11259-023-10210-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/28/2023] [Indexed: 09/11/2023]
Abstract
Polyomaviruses are widely distributed viruses of birds that may induce developmental deformities and internal organ disorders primarily in nestlings. In this study, polyomavirus sequence was detected in kidney and liver samples of a common kestrel (Falco tinnunculus) that succumbed at a rescue station in Hungary. The amplified 5025 nucleotide (nt) long genome contained the early (large and small T antigen, LTA and STA) and late (viral proteins, VP1, VP2, VP3) open reading frames (ORFs) typical for polyomaviruses. One of the additional putative ORFs (named VP4) showed identical localization with the VP4 and ORF-X of gammapolyomaviruses, but putative splicing sites could not be found in its sequence. Interestingly, the predicted 123 amino acid (aa) long protein sequence showed the highest similarity with human papillomavirus E4 early proteins in respect of the aa distribution and motif arrangement implying similar functions. The LTA of the kestrel polyomavirus shared <59.2% nt and aa pairwise identity with the LTA sequence of other polyomaviruses and formed a separated branch in the phylogenetic tree among gammapolyomaviruses. Accordingly, the kestrel polyomavirus may be the first member of a novel species within the Gammapolyomavirus genus, tentatively named Gammapolyomavirus faltin.
Collapse
Affiliation(s)
- Enikő Fehér
- HUN-REN Veterinary Medical Research Institute, Budapest, Hungary.
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, Budapest, Hungary.
- National Laboratory of Virology, Szentágothai Research Centre, University of Pécs, Pécs, Hungary.
| | - Eszter Kaszab
- HUN-REN Veterinary Medical Research Institute, Budapest, Hungary
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, Budapest, Hungary
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - János András Mótyán
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Dóra Máté
- HUN-REN Veterinary Medical Research Institute, Budapest, Hungary
| | - Krisztina Bali
- HUN-REN Veterinary Medical Research Institute, Budapest, Hungary
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, Budapest, Hungary
| | - Márton Hoitsy
- Conservation and Veterinary Services, Budapest Zoo and Botanical Garden, Budapest, Hungary
- Department of Exotic Animal and Wildlife Medicine, University of Veterinary Medicine, Budapest, Hungary
| | - Endre Sós
- Conservation and Veterinary Services, Budapest Zoo and Botanical Garden, Budapest, Hungary
- Department of Exotic Animal and Wildlife Medicine, University of Veterinary Medicine, Budapest, Hungary
| | - Ferenc Jakab
- National Laboratory of Virology, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
| | - Krisztián Bányai
- HUN-REN Veterinary Medical Research Institute, Budapest, Hungary
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, Budapest, Hungary
- Department of Pharmacology and Toxicology, University of Veterinary Medicine, Budapest, Hungary
| |
Collapse
|
26
|
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists. Sci Rep 2023; 13:22994. [PMID: 38151513 PMCID: PMC10752908 DOI: 10.1038/s41598-023-50393-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
The role of estrogen receptors (ERs) in breast cancer is of great importance in both clinical practice and scientific exploration. However, around 15-30% of those affected do not see benefits from the usual treatments owing to the innate resistance mechanisms, while 30-40% will gain resistance through treatments. In order to address this problem and facilitate community-wide efforts, machine learning (ML)-based approaches are considered one of the most cost-effective and large-scale identification methods. Herein, we propose a new SMILES-based stacked approach, termed StackER, for the accelerated and efficient identification of ERα and ERβ inhibitors. In StackER, we first established an up-to-date dataset consisting of 1,996 and 1,207 compounds for ERα and ERβ, respectively. Using the up-to-date dataset, StackER explored a wide range of different SMILES-based feature descriptors and ML algorithms in order to generate probabilistic features (PFs). Finally, the selected PFs derived from the two-step feature selection strategy were used for the development of an efficient stacked model. Both cross-validation and independent tests showed that StackER surpassed several conventional ML classifiers and the existing method in precisely predicting ERα and ERβ inhibitors. Remarkably, StackER achieved MCC values of 0.829-0.847 and 0.712-0.786 in terms of the cross-validation and independent tests, respectively, which were 5.92-8.29 and 1.59-3.45% higher than the existing method. In addition, StackER was applied to determine useful features for being ERα and ERβ inhibitors and identify FDA-approved drugs as potential ERα inhibitors in efforts to facilitate drug repurposing. This innovative stacked method is anticipated to facilitate community-wide efforts in efficiently narrowing down ER inhibitor screening.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
27
|
Zöller B, Manderstedt E, Lind-Halldén C, Halldén C. Rare-variant collapsing and bioinformatic analyses for amyloidosis, dementia and Parkinson's disease in the UK biobank reveal novel susceptibility loci. Amyloid 2023; 30:442-444. [PMID: 37449354 DOI: 10.1080/13506129.2023.2226299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/02/2023] [Accepted: 06/07/2023] [Indexed: 07/18/2023]
Affiliation(s)
- Bengt Zöller
- Department of Clinical Sciences, Center for Primary Health Care Research, Lund University and Region Skåne, Malmö, Sweden
| | - Eric Manderstedt
- Department of Clinical Sciences, Center for Primary Health Care Research, Lund University and Region Skåne, Malmö, Sweden
| | - Christina Lind-Halldén
- Department of Environmental Science and Bioscience, Kristianstad University, Kristianstad, Sweden
| | - Christer Halldén
- Department of Clinical Sciences, Center for Primary Health Care Research, Lund University and Region Skåne, Malmö, Sweden
| |
Collapse
|
28
|
Charoenkwan P, Kongsompong S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides. BMC Bioinformatics 2023; 24:356. [PMID: 37735626 PMCID: PMC10512532 DOI: 10.1186/s12859-023-05463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/01/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Tyrosinase is an enzyme involved in melanin production in the skin. Several hyperpigmentation disorders involve the overproduction of melanin and instability of tyrosinase activity resulting in darker, discolored patches on the skin. Therefore, discovering tyrosinase inhibitory peptides (TIPs) is of great significance for basic research and clinical treatments. However, the identification of TIPs using experimental methods is generally cost-ineffective and time-consuming. RESULTS Herein, a stacked ensemble learning approach, called TIPred, is proposed for the accurate and quick identification of TIPs by using sequence information. TIPred explored a comprehensive set of various baseline models derived from well-known machine learning (ML) algorithms and heterogeneous feature encoding schemes from multiple perspectives, such as chemical structure properties, physicochemical properties, and composition information. Subsequently, 130 baseline models were trained and optimized to create new probabilistic features. Finally, the feature selection approach was utilized to determine the optimal feature vector for developing TIPred. Both tenfold cross-validation and independent test methods were employed to assess the predictive capability of TIPred by using the stacking strategy. Experimental results showed that TIPred significantly outperformed the state-of-the-art method in terms of the independent test, with an accuracy of 0.923, MCC of 0.757 and an AUC of 0.977. CONCLUSIONS The proposed TIPred approach could be a valuable tool for rapidly discovering novel TIPs and effectively identifying potential TIP candidates for follow-up experimental validation. Moreover, an online webserver of TIPred is publicly available at http://pmlabstack.pythonanywhere.com/TIPred .
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Sasikarn Kongsompong
- Interdisciplinary Graduate Program in Bioscience, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
29
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
30
|
Charoenkwan P, Waramit S, Chumnanpuen P, Schaduangrat N, Shoombuatong W. TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus. PLoS One 2023; 18:e0290538. [PMID: 37624802 PMCID: PMC10456195 DOI: 10.1371/journal.pone.0290538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open
Abstract
Hepatitis C virus (HCV) infection is a concerning health issue that causes chronic liver diseases. Despite many successful therapeutic outcomes, no effective HCV vaccines are currently available. Focusing on T cell activity, the primary effector for HCV clearance, T cell epitopes of HCV (TCE-HCV) are considered promising elements to accelerate HCV vaccine efficacy. Thus, accurate and rapid identification of TCE-HCVs is recommended to obtain more efficient therapy for chronic HCV infection. In this study, a novel sequence-based stacked approach, termed TROLLOPE, is proposed to accurately identify TCE-HCVs from sequence information. Specifically, we employed 12 different sequence-based feature descriptors from heterogeneous perspectives, such as physicochemical properties, composition-transition-distribution information and composition information. These descriptors were used in cooperation with 12 popular machine learning (ML) algorithms to create 144 base-classifiers. To maximize the utility of these base-classifiers, we used a feature selection strategy to determine a collection of potential base-classifiers and integrated them to develop the meta-classifier. Comprehensive experiments based on both cross-validation and independent tests demonstrated the superior predictive performance of TROLLOPE compared with conventional ML classifiers, with cross-validation and independent test accuracies of 0.745 and 0.747, respectively. Finally, a user-friendly online web server of TROLLOPE (http://pmlabqsar.pythonanywhere.com/TROLLOPE) has been developed to serve research efforts in the large-scale identification of potential TCE-HCVs for follow-up experimental verification.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Sajee Waramit
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
31
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
32
|
Yang R, Liu J, Zhang L. ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features. Comput Biol Chem 2023; 104:107853. [PMID: 36990028 DOI: 10.1016/j.compbiolchem.2023.107853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023]
Abstract
Amyloid fibrils formed by the mis-aggregation of amyloid proteins can lead to neuronal degenerations in the Alzheimer's disease. Predicting amyloid proteins not only contributes to understanding physicochemical properties and formation mechanism of amyloid proteins, but also has significant implications in the amyloid disease treatment and the development of a new purpose for amyloid materials. In this study, an ensemble learning model with sequence-derived features, ECAmyloid, is proposed to identify amyloids. The sequence-derived features including Pseudo Position Specificity Score Matrix (Pse-PSSM), Split Amino Acid Composition (SAAC), Solvent Accessibility (SA), and Secondary Structure Information (SSI) are employed to incorporate sequence composition, evolutionary and structural information. The individual learners of the ensemble learning model are selected by an increment classifier selection strategy. The final prediction results are determined by voting of prediction results of multiple individual learners. In view of the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted to generate positive samples. To eliminate irrelevant features and redundant features, correlation-based feature subset (CFS) selection combined with a heuristic search strategy is performed to obtain the optimal feature subset. Experimental results indicate that the ensemble classifier achieves an accuracy of 98.29%, a sensitivity of 0.992, a specificity of 0.974 on the training dataset using the 10-fold cross validation, far higher than the results obtained by its individual learners. Compared with the original feature set, the accuracy, sensitivity, specificity, MCC, F1-score, G-Mean of the ensemble method trained by the optimal feature subset are improved by 1.05%, 0.012, 0.01, 0.021, 0.011 and 0.011, respectively. Moreover, the comparison results with existing methods on two same independent test datasets demonstrate that the proposed method is an effective and promising predictor for large-scale determination of amyloid proteins. The data and code used to develop ECAmyloid has been shared to Github, and can be freely downloaded at https://github.com/KOALA-L/ECAmyloid.git.
Collapse
Affiliation(s)
- Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Jiaming Liu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| |
Collapse
|
33
|
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform 2023; 15:50. [PMID: 37149650 PMCID: PMC10163717 DOI: 10.1186/s13321-023-00721-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/08/2023] [Indexed: 05/08/2023] Open
Abstract
Drug resistance represents a major obstacle to therapeutic innovations and is a prevalent feature in prostate cancer (PCa). Androgen receptors (ARs) are the hallmark therapeutic target for prostate cancer modulation and AR antagonists have achieved great success. However, rapid emergence of resistance contributing to PCa progression is the ultimate burden of their long-term usage. Hence, the discovery and development of AR antagonists with capability to combat the resistance, remains an avenue for further exploration. Therefore, this study proposes a novel deep learning (DL)-based hybrid framework, named DeepAR, to accurately and rapidly identify AR antagonists by using only the SMILES notation. Specifically, DeepAR is capable of extracting and learning the key information embedded in AR antagonists. Firstly, we established a benchmark dataset by collecting active and inactive compounds against AR from the ChEMBL database. Based on this dataset, we developed and optimized a collection of baseline models by using a comprehensive set of well-known molecular descriptors and machine learning algorithms. Then, these baseline models were utilized for creating probabilistic features. Finally, these probabilistic features were combined and used for the construction of a meta-model based on a one-dimensional convolutional neural network. Experimental results indicated that DeepAR is a more accurate and stable approach for identifying AR antagonists in terms of the independent test dataset, by achieving an accuracy of 0.911 and MCC of 0.823. In addition, our proposed framework is able to provide feature importance information by leveraging a popular computational approach, named SHapley Additive exPlanations (SHAP). In the meanwhile, the characterization and analysis of potential AR antagonist candidates were achieved through the SHAP waterfall plot and molecular docking. The analysis inferred that N-heterocyclic moieties, halogenated substituents, and a cyano functional group were significant determinants of potential AR antagonists. Lastly, we implemented an online web server by using DeepAR (at http://pmlabstack.pythonanywhere.com/DeepAR ). We anticipate that DeepAR could be a useful computational tool for community-wide facilitation of AR candidates from a large number of uncharacterized compounds.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
34
|
Charoenkwan P, Schaduangrat N, Pham NT, Manavalan B, Shoombuatong W. Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+ t-cell epitopes of eukaryotic pathogens. Int J Biol Macromol 2023; 238:124228. [PMID: 36996953 DOI: 10.1016/j.ijbiomac.2023.124228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/11/2023] [Accepted: 03/25/2023] [Indexed: 03/31/2023]
Abstract
T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8+ TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8+ TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8+ TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8+ TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8+ TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8+ TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
35
|
Sanchez-Pulido L, Ponting CP. OAF: a new member of the BRICHOS family. BIOINFORMATICS ADVANCES 2022; 2:vbac087. [PMID: 36699367 PMCID: PMC9714404 DOI: 10.1093/bioadv/vbac087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/03/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022]
Abstract
Summary The 10 known BRICHOS domain-containing proteins in humans have been linked to an unusually long list of pathologies, including cancer, obesity and two amyloid-like diseases. BRICHOS domains themselves have been described as intramolecular chaperones that act to prevent amyloid-like aggregation of their proteins' mature polypeptides. Using structural comparison of coevolution-based AlphaFold models and sequence conservation, we identified the Out at First (OAF) protein as a new member of the BRICHOS family in humans. OAF is an experimentally uncharacterized protein that has been proposed as a candidate biomarker for clinical management of coronavirus disease 2019 infections. Our analysis revealed how structural comparison of AlphaFold models can discover remote homology relationships and lead to a better understanding of BRICHOS domain molecular mechanism. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| |
Collapse
|