1
|
Li J, Lu X, Jiang K, Tang D, Ning B, Sun F. TARSL: Triple-Attention Cross-Network Representation Learning to Predict Synthetic Lethality for Anti-Cancer Drug Discovery. IEEE J Biomed Health Inform 2025; 29:1680-1691. [PMID: 37603479 DOI: 10.1109/jbhi.2023.3306768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Cancer is a multifaceted disease that results from co-mutations of multi biological molecules. A promising strategy for cancer therapy involves in exploiting the phenomenon of Synthetic Lethality (SL) by targeting the SL partner of cancer gene. Since traditional methods for SL prediction suffer from high-cost, time-consuming and off-targets effects, computational approaches have been efficient complementary to these methods. Most of existing approaches treat SL associations as independent of other biological interaction networks, and fail to consider other information from various biological networks. Despite some approaches have integrated different networks to capture multi-modal features of genes for SL prediction, these methods implicitly assume that all sources and levels of information contribute equally to the SL associations. As such, a comprehensive and flexible framework for learning gene cross-network representations for SL prediction is still lacking. In this work, we present a novel Triple-Attention cross-network Representation learning for SL prediction (TARSL) by capturing molecular features from heterogeneous sources. We employ three-level attention modules to consider the different contribution of multi-level information. In particular, feature-level attention can capture the correlations between molecular feature and network link, node-level attention can differentiate the importance of various neighbors, and network-level attention can concentrate on important network and reduce the effects of irrelated networks. We perform comprehensive experiments on human SL datasets and these results have proven that our model is consistently superior to baseline methods and predicted SL associations could aid in designing anti-cancer drugs.
Collapse
|
2
|
Matrikines as mediators of tissue remodelling. Adv Drug Deliv Rev 2022; 185:114240. [PMID: 35378216 DOI: 10.1016/j.addr.2022.114240] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/21/2022] [Accepted: 03/26/2022] [Indexed: 11/21/2022]
Abstract
Extracellular matrix (ECM) proteins confer biomechanical properties, maintain cell phenotype and mediate tissue repair (via release of sequestered cytokines and proteases). In contrast to intracellular proteomes, where proteins are monitored and replaced over short time periods, many ECM proteins function for years (decades in humans) without replacement. The longevity of abundant ECM proteins, such as collagen I and elastin, leaves them vulnerable to damage accumulation and their host organs prone to chronic, age-related diseases. However, ECM protein fragmentation can potentially produce peptide cytokines (matrikines) which may exacerbate and/or ameliorate age- and disease-related ECM remodelling. In this review, we discuss ECM composition, function and degradation and highlight examples of endogenous matrikines. We then critically and comprehensively analyse published studies of matrix-derived peptides used as topical skin treatments, before considering the potential for improvements in the discovery and delivery of novel matrix-derived peptides to skin and internal organs. From this, we conclude that while the translational impact of matrix-derived peptide therapeutics is evident, the mechanisms of action of these peptides are poorly defined. Further, well-designed, multimodal studies are required.
Collapse
|
3
|
Marini S, Oliva M, Slizovskiy IB, Das RA, Noyes NR, Kahveci T, Boucher C, Prosperi M. AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data. Gigascience 2022; 11:giac029. [PMID: 35583675 PMCID: PMC9116207 DOI: 10.1093/gigascience/giac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/27/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples. RESULTS We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2-0.9). On semi-synthetic metagenomic data-external test-on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols. CONCLUSIONS AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.
Collapse
Affiliation(s)
- Simone Marini
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| | - Marco Oliva
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Ilya B Slizovskiy
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Avenue 225, St. Paul, MN 55108, USA
| | - Rishabh A Das
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| | - Noelle Robertson Noyes
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Avenue 225, St. Paul, MN 55108, USA
| | - Tamer Kahveci
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Mattia Prosperi
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| |
Collapse
|
4
|
Qiu Y, Ching WK, Zou Q. Matrix factorization-based data fusion for the prediction of RNA-binding proteins and alternative splicing event associations during epithelial-mesenchymal transition. Brief Bioinform 2021; 22:6354719. [PMID: 34410342 DOI: 10.1093/bib/bbab332] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/11/2021] [Accepted: 07/29/2021] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION The epithelial-mesenchymal transition (EMT) is a cellular-developmental process activated during tumor metastasis. Transcriptional regulatory networks controlling EMT are well studied; however, alternative RNA splicing also plays a critical regulatory role during this process. Unfortunately, a comprehensive understanding of alternative splicing (AS) and the RNA-binding proteins (RBPs) that regulate it during EMT remains largely unknown. Therefore, a great need exists to develop effective computational methods for predicting associations of RBPs and AS events. Dramatically increasing data sources that have direct and indirect information associated with RBPs and AS events have provided an ideal platform for inferring these associations. RESULTS In this study, we propose a novel method for RBP-AS target prediction based on weighted data fusion with sparse matrix tri-factorization (WDFSMF in short) that simultaneously decomposes heterogeneous data source matrices into low-rank matrices to reveal hidden associations. WDFSMF can select and integrate data sources by assigning different weights to those sources, and these weights can be assigned automatically. In addition, WDFSMF can identify significant RBP complexes regulating AS events and eliminate noise and outliers from the data. Our proposed method achieves an area under the receiver operating characteristic curve (AUC) of $90.78\%$, which shows that WDFSMF can effectively predict RBP-AS event associations with higher accuracy compared with previous methods. Furthermore, this study identifies significant RBPs as complexes for AS events during EMT and provides solid ground for further investigation into RNA regulation during EMT and metastasis. WDFSMF is a general data fusion framework, and as such it can also be adapted to predict associations between other biological entities.
Collapse
Affiliation(s)
- Yushan Qiu
- College of Mathematics and Statistics, Shenzhen University, 518000 Guangdong, China
| | - Wai-Ki Ching
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
5
|
Drug-Target Interaction Prediction Based on Adversarial Bayesian Personalized Ranking. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6690154. [PMID: 33628808 PMCID: PMC7889346 DOI: 10.1155/2021/6690154] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/17/2021] [Accepted: 01/23/2021] [Indexed: 12/13/2022]
Abstract
The prediction of drug-target interaction (DTI) is a key step in drug repositioning. In recent years, many studies have tried to use matrix factorization to predict DTI, but they only use known DTIs and ignore the features of drug and target expression profiles, resulting in limited prediction performance. In this study, we propose a new DTI prediction model named AdvB-DTI. Within this model, the features of drug and target expression profiles are associated with Adversarial Bayesian Personalized Ranking through matrix factorization. Firstly, according to the known drug-target relationships, a set of ternary partial order relationships is generated. Next, these partial order relationships are used to train the latent factor matrix of drugs and targets using the Adversarial Bayesian Personalized Ranking method, and the matrix factorization is improved by the features of drug and target expression profiles. Finally, the scores of drug-target pairs are achieved by the inner product of latent factors, and the DTI prediction is performed based on the score ranking. The proposed model effectively takes advantage of the idea of learning to rank to overcome the problem of data sparsity, and perturbation factors are introduced to make the model more robust. Experimental results show that our model could achieve a better DTI prediction performance.
Collapse
|
6
|
Nazzicari N, Vella D, Coronnello C, Di Silvestre D, Bellazzi R, Marini S. MTGO-SC, A Tool to Explore Gene Modules in Single-Cell RNA Sequencing Data. Front Genet 2019; 10:953. [PMID: 31649730 PMCID: PMC6794379 DOI: 10.3389/fgene.2019.00953] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 09/05/2019] [Indexed: 01/08/2023] Open
Abstract
The identification of functional modules in gene interaction networks is a key step in understanding biological processes. Network interpretation is essential for unveiling biological mechanisms, candidate biomarkers, or potential targets for drug discovery/repositioning. Plenty of biological module identification algorithms are available, although none is explicitly designed to perform the task on single-cell RNA sequencing (scRNA-seq) data. Here, we introduce MTGO-SC, an adaptation for scRNA-seq of our biological network module detection algorithm MTGO. MTGO-SC isolates gene functional modules by leveraging on both the network topological structure and the annotations characterizing the nodes (genes). These annotations are provided by an external source, such as databases and literature repositories (e.g., the Gene Ontology, Reactome). Thanks to the depth of single-cell data, it is possible to define one network for each cell cluster (typically, cell type or state) composing each sample, as opposed to traditional bulk RNA-seq, where the emerging gene network is averaged over the whole sample. MTGO-SC provides two complexity levels for interpretation: the gene-gene interaction and the intermodule interaction networks. MTGO-SC is versatile in letting the users define the rules to extract the gene network and integrated with the Seurat scRNA-seq analysis pipeline. MTGO-SC is available at https://github.com/ne1s0n/MTGOsc.
Collapse
Affiliation(s)
- Nelson Nazzicari
- Research Centre for Fodder Crops and Dairy Productions, Council for Agricultural Research and Economics (CREA), Lodi, Italy
| | - Danila Vella
- Bioengineering Unit, Ri. MED Foundation, Palermo, Italy.,Istituti Clinici Scientifici Maugeri, Pavia, Italy
| | | | - Dario Di Silvestre
- Institute of Biomedical Technologies, National Research Council, Segrate, Italy
| | - Riccardo Bellazzi
- Istituti Clinici Scientifici Maugeri, Pavia, Italy.,Department of Electrical, Computer and Biomedical Engineering; Centre for Health, Technologies, University of Pavia, Pavia, Italy
| | - Simone Marini
- Department of Electrical, Computer and Biomedical Engineering; Centre for Health, Technologies, University of Pavia, Pavia, Italy.,Department of Surgery, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
7
|
Fedonin GG, Eroshkin A, Cieplak P, Matveev EV, Ponomarev GV, Gelfand MS, Ratnikov BI, Kazanov MD. Predictive models of protease specificity based on quantitative protease-activity profiling data. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2019; 1867:140253. [PMID: 31330204 DOI: 10.1016/j.bbapap.2019.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 07/09/2019] [Accepted: 07/17/2019] [Indexed: 10/26/2022]
Abstract
Bioinformatics-based prediction of protease substrates can help to elucidate regulatory proteolytic pathways that control a broad range of biological processes such as apoptosis and blood coagulation. The majority of published predictive models are position weight matrices (PWM) reflecting specificity of proteases toward target sequence. These models are typically derived from experimental data on positions of hydrolyzed peptide bonds and show a reasonable predictive power. New emerging techniques that not only register the cleavage position but also measure catalytic efficiency of proteolysis are expected to improve the quality of predictions or at least substantially reduce the number of tested substrates required for confident predictions. The main goal of this study was to develop new prediction models based on such data and to estimate the performance of the constructed models. We used data on catalytic efficiency of proteolysis measured for eight major human matrix metalloproteinases to construct predictive models of protease specificity using a variety of regression analysis techniques. The obtained results suggest that efficiency-based (quantitative) models show a comparable performance with conventional PWM-based algorithms, while less training data are required. The derived list of candidate cleavage sites in human secreted proteins may serve as a starting point for experimental analysis.
Collapse
Affiliation(s)
- Gennady G Fedonin
- Central Research Institute of Epidemiology, Moscow 111123, Russia; A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Moscow Institute of Physics and Technology, Dolgoprudny 141700, Russia
| | - Alexey Eroshkin
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Piotr Cieplak
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | | | - Gennady V Ponomarev
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia
| | - Mikhail S Gelfand
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Skolkovo Institute of Science and Technology, Moscow 121205, Russia; National Research University Higher School of Economics, Moscow 101000, Russia
| | - Boris I Ratnikov
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Marat D Kazanov
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Skolkovo Institute of Science and Technology, Moscow 121205, Russia; Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117997, Russia.
| |
Collapse
|
8
|
Čopar A, Zupan B, Zitnik M. Fast optimization of non-negative matrix tri-factorization. PLoS One 2019; 14:e0217994. [PMID: 31185054 PMCID: PMC6559648 DOI: 10.1371/journal.pone.0217994] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 05/22/2019] [Indexed: 11/18/2022] Open
Abstract
Non-negative matrix tri-factorization (NMTF) is a popular technique for learning low-dimensional feature representation of relational data. Currently, NMTF learns a representation of a dataset through an optimization procedure that typically uses multiplicative update rules. This procedure has had limited success, and its failure cases have not been well understood. We here perform an empirical study involving six large datasets comparing multiplicative update rules with three alternative optimization methods, including alternating least squares, projected gradients, and coordinate descent. We find that methods based on projected gradients and coordinate descent converge up to twenty-four times faster than multiplicative update rules. Furthermore, alternating least squares method can quickly train NMTF models on sparse datasets but often fails on dense datasets. Coordinate descent-based NMTF converges up to sixteen times faster compared to well-established methods.
Collapse
Affiliation(s)
- Andrej Čopar
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States of America
| | - Marinka Zitnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
9
|
Radchenko T, Fontaine F, Morettoni L, Zamora I. Software-aided workflow for predicting protease-specific cleavage sites using physicochemical properties of the natural and unnatural amino acids in peptide-based drug discovery. PLoS One 2019; 14:e0199270. [PMID: 30620739 PMCID: PMC6324806 DOI: 10.1371/journal.pone.0199270] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/18/2018] [Indexed: 12/03/2022] Open
Abstract
Peptide drugs have been used in the treatment of multiple pathologies. During peptide discovery, it is crucially important to be able to map the potential sites of cleavages of the proteases. This knowledge is used to later chemically modify the peptide drug to adapt it for the therapeutic use, making peptide stable against individual proteases or in complex medias. In some other cases it needed to make it specifically unstable for some proteases, as peptides could be used as a system to target delivery drugs on specific tissues or cells. The information about proteases, their sites of cleavages and substrates are widely spread across publications and collected in databases such as MEROPS. Therefore, it is possible to develop models to improve the understanding of the potential peptide drug proteolysis. We propose a new workflow to derive protease specificity rules and predict the potential scissile bonds in peptides for individual proteases. WebMetabase stores the information from experimental or external sources in a chemically aware database where each peptide and site of cleavage is represented as a sequence of structural blocks connected by amide bonds and characterized by its physicochemical properties described by Volsurf descriptors. Thus, this methodology could be applied in the case of non-standard amino acid. A frequency analysis can be performed in WebMetabase to discover the most frequent cleavage sites. These results were used to train several models using logistic regression, support vector machine and ensemble tree classifiers to map cleavage sites for several human proteases from four different families (serine, cysteine, aspartic and matrix metalloproteases). Finally, we compared the predictive performance of the developed models with other available public tools PROSPERous and SitePrediction.
Collapse
Affiliation(s)
- Tatiana Radchenko
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| | | | | | - Ismael Zamora
- Pompeu Fabra University, Barcelona, Spain
- Lead Molecular Design, S. L, Sant Cugat del Vallés, Spain
- * E-mail: (TR); (IZ)
| |
Collapse
|