1
|
Mall R, Singh A, Patel CN, Guirimand G, Castiglione F. VISH-Pred: an ensemble of fine-tuned ESM models for protein toxicity prediction. Brief Bioinform 2024; 25:bbae270. [PMID: 38842509 PMCID: PMC11154842 DOI: 10.1093/bib/bbae270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/06/2024] [Accepted: 05/23/2024] [Indexed: 06/07/2024] Open
Abstract
Peptide- and protein-based therapeutics are becoming a promising treatment regimen for myriad diseases. Toxicity of proteins is the primary hurdle for protein-based therapies. Thus, there is an urgent need for accurate in silico methods for determining toxic proteins to filter the pool of potential candidates. At the same time, it is imperative to precisely identify non-toxic proteins to expand the possibilities for protein-based biologics. To address this challenge, we proposed an ensemble framework, called VISH-Pred, comprising models built by fine-tuning ESM2 transformer models on a large, experimentally validated, curated dataset of protein and peptide toxicities. The primary steps in the VISH-Pred framework are to efficiently estimate protein toxicities taking just the protein sequence as input, employing an under sampling technique to handle the humongous class-imbalance in the data and learning representations from fine-tuned ESM2 protein language models which are then fed to machine learning techniques such as Lightgbm and XGBoost. The VISH-Pred framework is able to correctly identify both peptides/proteins with potential toxicity and non-toxic proteins, achieving a Matthews correlation coefficient of 0.737, 0.716 and 0.322 and F1-score of 0.759, 0.696 and 0.713 on three non-redundant blind tests, respectively, outperforming other methods by over $10\%$ on these quality metrics. Moreover, VISH-Pred achieved the best accuracy and area under receiver operating curve scores on these independent test sets, highlighting the robustness and generalization capability of the framework. By making VISH-Pred available as an easy-to-use web server, we expect it to serve as a valuable asset for future endeavors aimed at discerning the toxicity of peptides and enabling efficient protein-based therapeutics.
Collapse
Affiliation(s)
- Raghvendra Mall
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Ankita Singh
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Chirag N Patel
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Gregory Guirimand
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, 657-8501, Japan
| | - Filippo Castiglione
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
- Institute for Applied Computing, National Research Council of Italy, Via dei Taurini, 19, 00185, Rome, Italy
| |
Collapse
|
2
|
Mall R, Bynigeri RR, Karki R, Malireddi RKS, Sharma B, Kanneganti TD. Pancancer transcriptomic profiling identifies key PANoptosis markers as therapeutic targets for oncology. NAR Cancer 2022; 4:zcac033. [PMID: 36329783 PMCID: PMC9623737 DOI: 10.1093/narcan/zcac033] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/03/2022] [Accepted: 10/28/2022] [Indexed: 11/24/2022] Open
Abstract
Resistance to programmed cell death (PCD) is a hallmark of cancer. While some PCD components are prognostic in cancer, the roles of many molecules can be masked by redundancies and crosstalks between PCD pathways, impeding the development of targeted therapeutics. Recent studies characterizing these redundancies have identified PANoptosis, a unique innate immune-mediated inflammatory PCD pathway that integrates components from other PCD pathways. Here, we designed a systematic computational framework to determine the pancancer clinical significance of PANoptosis and identify targetable biomarkers. We found that high expression of PANoptosis genes was detrimental in low grade glioma (LGG) and kidney renal cell carcinoma (KIRC). ZBP1, ADAR, CASP2, CASP3, CASP4, CASP8 and GSDMD expression consistently had negative effects on prognosis in LGG across multiple survival models, while AIM2, CASP3, CASP4 and TNFRSF10 expression had negative effects for KIRC. Conversely, high expression of PANoptosis genes was beneficial in skin cutaneous melanoma (SKCM), with ZBP1, NLRP1, CASP8 and GSDMD expression consistently having positive prognostic effects. As a therapeutic proof-of-concept, we treated melanoma cells with combination therapy that activates ZBP1 and showed that this treatment induced PANoptosis. Overall, through our systematic framework, we identified and validated key innate immune biomarkers from PANoptosis which can be targeted to improve patient outcomes in cancers.
Collapse
Affiliation(s)
- Raghvendra Mall
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Ratnakar R Bynigeri
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Rajendra Karki
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | - Bhesh Raj Sharma
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | |
Collapse
|
3
|
M-GWNN: Multi-granularity graph wavelet neural networks for semi-supervised node classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
4
|
|
5
|
Huang M, Zou G, Zhang B, Liu Y, Gu Y, Jiang K. Overlapping community detection in heterogeneous social networks via the user model. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.11.055] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
|
7
|
Abstract
The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models. A key element is to first characterize these kernel machines in terms of so-called conjugate feature duality, yielding a representation with visible and hidden units. It is shown how this is related to the energy form in restricted Boltzmann machines, with continuous variables in a nonprobabilistic setting. In this new framework of so-called restricted kernel machine (RKM) representations, the dual variables correspond to hidden features. Deep RKM are obtained by coupling the RKMs. The method is illustrated for deep RKM, consisting of three levels with a least squares support vector machine regression level and two kernel PCA levels. In its primal form also deep feedforward neural networks can be trained within this framework.
Collapse
|
8
|
Mall R, Cerulo L, Bensmail H, Iavarone A, Ceccarelli M. Detection of statistically significant network changes in complex biological networks. BMC SYSTEMS BIOLOGY 2017; 11:32. [PMID: 28259158 PMCID: PMC5336651 DOI: 10.1186/s12918-017-0412-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 02/22/2017] [Indexed: 01/10/2023]
Abstract
Background Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localized re-wiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to the discovery of novel relevant signatures. Several procedures have been recently proposed to detect sub-network differences in pairwise labeled weighted networks. Methods In this paper, we propose an improvement over the state-of-the-art based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate p-values for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation. Results In the case of dense random geometric networks the proposed approach is 10-15x faster and achieves 5-10% higher AUC, Precision/Recall, and Kappa value than the state-of-the-art. We also report the application of the method to dissect the difference between the regulatory networks of IDH-mutant versus IDH-wild-type glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates. Conclusions We show that our network differencing procedure can effectively and efficiently detect statistical significant network re-wirings in different conditions. When applied to detect the main differences between the networks of IDH-mutant and IDH-wild-type glioma tumors, it correctly selects sub-networks centered on important key regulators of these two different subtypes. In addition, its application highlights several novel candidates that cannot be detected by standard single network-based approaches. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0412-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raghvendra Mall
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar.
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, Benevento, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", Ariano Irpino (AV), Italy
| | - Halima Bensmail
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar
| | - Antonio Iavarone
- Department of Neurology, Department of Pathology, Institute for Cancer Genetics, Columbia University Medical Center, New York, USA
| | - Michele Ceccarelli
- QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, Italy.
| |
Collapse
|
9
|
Langone R, Suykens JA. Supervised aggregated feature learning for multiple instance classification. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.09.060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
|
11
|
Blanco MR, Martin JS, Kahlscheuer ML, Krishnan R, Abelson J, Laederach A, Walter NG. Single Molecule Cluster Analysis dissects splicing pathway conformational dynamics. Nat Methods 2015; 12:1077-84. [PMID: 26414013 DOI: 10.1038/nmeth.3602] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Accepted: 08/10/2015] [Indexed: 02/01/2023]
Abstract
We report Single Molecule Cluster Analysis (SiMCAn), which utilizes hierarchical clustering of hidden Markov modeling-fitted single-molecule fluorescence resonance energy transfer (smFRET) trajectories to dissect the complex conformational dynamics of biomolecular machines. We used this method to study the conformational dynamics of a precursor mRNA during the splicing cycle as carried out by the spliceosome. By clustering common dynamic behaviors derived from selectively blocked splicing reactions, SiMCAn was able to identify the signature conformations and dynamic behaviors of multiple ATP-dependent intermediates. In addition, it identified an open conformation adopted late in splicing by a 3' splice-site mutant, invoking a mechanism for substrate proofreading. SiMCAn enables rapid interpretation of complex single-molecule behaviors and should prove useful for the comprehensive analysis of a plethora of dynamic cellular machines.
Collapse
Affiliation(s)
- Mario R Blanco
- Department of Chemistry, Single Molecule Analysis Group, University of Michigan, Ann Arbor, Michigan, USA.,Cellular and Molecular Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Joshua S Martin
- Biology Department, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Matthew L Kahlscheuer
- Department of Chemistry, Single Molecule Analysis Group, University of Michigan, Ann Arbor, Michigan, USA
| | - Ramya Krishnan
- Department of Chemistry, Single Molecule Analysis Group, University of Michigan, Ann Arbor, Michigan, USA
| | - John Abelson
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, California, USA
| | - Alain Laederach
- Biology Department, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Nils G Walter
- Department of Chemistry, Single Molecule Analysis Group, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
12
|
Abstract
Real-world complex networks are dynamic in nature and change over time. The change is usually observed in the interactions within the network over time. Complex networks exhibit community like structures. A key feature of the dynamics of complex networks is the evolution of communities over time. Several methods have been proposed to detect and track the evolution of these groups over time. However, there is no generic tool which visualizes all the aspects of group evolution in dynamic networks including birth, death, splitting, merging, expansion, shrinkage and continuation of groups. In this paper, we propose Netgram: a tool for visualizing evolution of communities in time-evolving graphs. Netgram maintains evolution of communities over 2 consecutive time-stamps in tables which are used to create a query database using the sql outer-join operation. It uses a line-based visualization technique which adheres to certain design principles and aesthetic guidelines. Netgram uses a greedy solution to order the initial community information provided by the evolutionary clustering technique such that we have fewer line cross-overs in the visualization. This makes it easier to track the progress of individual communities in time evolving graphs. Netgram is a generic toolkit which can be used with any evolutionary community detection algorithm as illustrated in our experiments. We use Netgram for visualization of topic evolution in the NIPS conference over a period of 11 years and observe the emergence and merging of several disciplines in the field of information processing systems.
Collapse
|
13
|
Mall R, Mehrkanoon S, Suykens JA. Identifying intervals for hierarchical clustering using the Gershgorin circle theorem. Pattern Recognit Lett 2015. [DOI: 10.1016/j.patrec.2014.12.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|