1
|
Abbasi M, Carvalho FG, Ribeiro B, Arrais JP. Predicting drug activity against cancer through genomic profiles and SMILES. Artif Intell Med 2024; 150:102820. [PMID: 38553160 DOI: 10.1016/j.artmed.2024.102820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/09/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Due to the constant increase in cancer rates, the disease has become a leading cause of death worldwide, enhancing the need for its detection and treatment. In the era of personalized medicine, the main goal is to incorporate individual variability in order to choose more precisely which therapy and prevention strategies suit each person. However, predicting the sensitivity of tumors to anticancer treatments remains a challenge. In this work, we propose two deep neural network models to predict the impact of anticancer drugs in tumors through the half-maximal inhibitory concentration (IC50). These models join biological and chemical data to apprehend relevant features of the genetic profile and the drug compounds, respectively. In order to predict the drug response in cancer cell lines, this study employed different DL methods, resorting to Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). In the first stage, two autoencoders were pre-trained with high-dimensional gene expression and mutation data of tumors. Afterward, this genetic background is transferred to the prediction models that return the IC50 value that portrays the potency of a substance in inhibiting a cancer cell line. When comparing RSEM Expected counts and TPM as methods for displaying gene expression data, RSEM has been shown to perform better in deep models and CNNs model can obtain better insight in these types of data. Moreover, the obtained results reflect the effectiveness of the extracted deep representations in the prediction of the IC50 value that portrays the potency of a substance in inhibiting a tumor, achieving a performance of a mean squared error of 1.06 and surpassing previous state-of-the-art models.
Collapse
Affiliation(s)
- Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal; Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal.
| | - Filipa G Carvalho
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Bernardete Ribeiro
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
2
|
Pereira TO, Abbasi M, Oliveira RI, Guedes RA, Salvador JAR, Arrais JP. Artificial intelligence for prediction of biological activities and generation of molecular hits using stereochemical information. J Comput Aided Mol Des 2023; 37:791-806. [PMID: 37847342 PMCID: PMC10618333 DOI: 10.1007/s10822-023-00539-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 10/02/2023] [Indexed: 10/18/2023]
Abstract
In this work, we develop a method for generating targeted hit compounds by applying deep reinforcement learning and attention mechanisms to predict binding affinity against a biological target while considering stereochemical information. The novelty of this work is a deep model Predictor that can establish the relationship between chemical structures and their corresponding [Formula: see text] values. We thoroughly study the effect of different molecular descriptors such as ECFP4, ECFP6, SMILES and RDKFingerprint. Also, we demonstrated the importance of attention mechanisms to capture long-range dependencies in molecular sequences. Due to the importance of stereochemical information for the binding mechanism, this information was employed both in the prediction and generation processes. To identify the most promising hits, we apply the self-adaptive multi-objective optimization strategy. Moreover, to ensure the existence of stereochemical information, we consider all the possible enumerated stereoisomers to provide the most appropriate 3D structures. We evaluated this approach against the Ubiquitin-Specific Protease 7 (USP7) by generating putative inhibitors for this target. The predictor with SMILES notations as descriptor plus bidirectional recurrent neural network using attention mechanism has the best performance. Additionally, our methodology identify the regions of the generated molecules that are important for the interaction with the receptor's active site. Also, the obtained results demonstrate that it is possible to discover synthesizable molecules with high biological affinity for the target, containing the indication of their optimal stereochemical conformation.
Collapse
Affiliation(s)
- Tiago O Pereira
- Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal.
| | - Maryam Abbasi
- Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
- Applied Research Institute, Polytechnic Institute of Coimbra, Coimbra, Portugal
- Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal
| | - Rita I Oliveira
- Laboratory of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, Coimbra, Portugal
| | - Romina A Guedes
- Laboratory of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, Coimbra, Portugal
| | - Jorge A R Salvador
- Laboratory of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
3
|
Pereira TO, Abbasi M, Arrais JP. Enhancing reinforcement learning for de novo molecular design applying self-attention mechanisms. Brief Bioinform 2023; 24:bbad368. [PMID: 37903414 DOI: 10.1093/bib/bbad368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/04/2023] [Accepted: 09/26/2023] [Indexed: 11/01/2023] Open
Abstract
The drug discovery process can be significantly improved by applying deep reinforcement learning (RL) methods that learn to generate compounds with desired pharmacological properties. Nevertheless, RL-based methods typically condense the evaluation of sampled compounds into a single scalar value, making it difficult for the generative agent to learn the optimal policy. This work combines self-attention mechanisms and RL to generate promising molecules. The idea is to evaluate the relative significance of each atom and functional group in their interaction with the target, and to utilize this information for optimizing the Generator. Therefore, the framework for de novo drug design is composed of a Generator that samples new compounds combined with a Transformer-encoder and a biological affinity Predictor that evaluate the generated structures. Moreover, it takes the advantage of the knowledge encapsulated in the Transformer's attention weights to evaluate each token individually. We compared the performance of two output prediction strategies for the Transformer: standard and masked language model (MLM). The results show that the MLM Transformer is more effective in optimizing the Generator compared with the state-of-the-art works. Additionally, the evaluation models identified the most important regions of each molecule for the biological interaction with the target. As a case study, we generated synthesizable hit compounds that can be putative inhibitors of the enzyme ubiquitin-specific protein 7 (USP7).
Collapse
Affiliation(s)
- Tiago O Pereira
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| |
Collapse
|
4
|
Monteiro NRC, Pereira TO, Machado ACD, Oliveira JL, Abbasi M, Arrais JP. FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers. Comput Biol Med 2023; 164:107285. [PMID: 37557054 DOI: 10.1016/j.compbiomed.2023.107285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/05/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
The design of compounds that target specific biological functions with relevant selectivity is critical in the context of drug discovery, especially due to the polypharmacological nature of most existing drug molecules. In recent years, in silico-based methods combined with deep learning have shown promising results in the de novo drug design challenge, leading to potential leads for biologically interesting targets. However, several of these methods overlook the importance of certain properties, such as validity rate and target selectivity, or simplify the generative process by neglecting the multi-objective nature of the pharmacological space. In this study, we propose a multi-objective Transformer-based architecture to generate drug candidates with desired molecular properties and increased selectivity toward a specific biological target. The framework consists of a Transformer-Decoder Generator that generates novel and valid compounds in the SMILES format notation, a Transformer-Encoder Predictor that estimates the binding affinity toward the biological target, and a feedback loop combined with a multi-objective optimization strategy to rank the generated molecules and condition the generating distribution around the targeted properties. The results demonstrate that the proposed architecture can generate novel and synthesizable small compounds with desired pharmacological properties toward a biologically relevant target. The unbiased Transformer-based Generator achieved superior performance in the novelty rate (97.38%) and comparable performance in terms of internal diversity, uniqueness, and validity against state-of-the-art baselines. The optimization of the unbiased Transformer-based Generator resulted in the generation of molecules exhibiting high binding affinity toward the Adenosine A2A Receptor (AA2AR) and possessing desirable physicochemical properties, where 99.36% of the generated molecules follow Lipinski's rule of five. Furthermore, the implementation of a feedback strategy, in conjunction with a multi-objective algorithm, effectively shifted the distribution of the generated molecules toward optimal values of molecular weight, molecular lipophilicity, topological polar surface area, synthetic accessibility score, and quantitative estimate of drug-likeness, without the necessity of prior training sets comprising molecules endowed with pharmacological properties of interest. Overall, this research study validates the applicability of a Transformer-based architecture in the context of drug design, capable of exploring the vast chemical representation space to generate novel molecules with improved pharmacological properties and target selectivity. The data and source code used in this study are available at: https://github.com/larngroup/FSM-DDTR.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | - Tiago O Pereira
- University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | - Ana Catarina D Machado
- University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal.
| | - Maryam Abbasi
- University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal.
| | - Joel P Arrais
- University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| |
Collapse
|
5
|
Torres L, Arrais JP, Ribeiro B. Few-shot learning via graph embeddings with convolutional networks for low-data molecular property prediction. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08403-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
AbstractGraph neural networks and convolutional architectures have proven to be pivotal in improving the prediction of molecular properties in drug discovery. However, this is fundamentally a low data problem that is incompatible with regular deep learning approaches. Contemporary deep networks require large amounts of training data, which severely limits the prediction of new molecular entities from limited available data. In this paper, we address the challenge of low data in molecular property prediction by: (1) defining a set of deep learning architectures that accept compound chemical structures in the form of molecular graphs, (2) creating a few-shot learning strategy across graph neural networks and convolutional neural networks to leverage the rich information of graph embeddings, and (3) proposing a two-module meta-learning framework to learn from task-transferable knowledge and predict molecular properties on few-shot data. Furthermore, we conduct multiple experiments on two benchmark multiproperty datasets to demonstrate a superior performance over conventional graph-based baselines. ROC-AUC results for 10-shot experiments show an average improvement of $$+11.37\%$$
+
11.37
%
on Tox21 and $$+0.53\%$$
+
0.53
%
on SIDER, which are representative small-sized biological datasets for molecular property prediction.
Collapse
|
6
|
Abbasi M, Santos BP, Pereira TC, Sofa R, Monteiro NRC, Simões CJV, Brito RMM, Ribeiro B, Oliveira JL, Arrais JP. Correction to: Designing optimized drug candidates with Generative Adversarial Network. J Cheminform 2022; 14:53. [PMID: 35953869 PMCID: PMC9367066 DOI: 10.1186/s13321-022-00631-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Maryam Abbasi
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal.
| | - Beatriz P Santos
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal
| | - Tiago C Pereira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Raul Sofa
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal
| | - Nelson R C Monteiro
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal
| | - Carlos J V Simões
- BSIM Therapeutics, Instituto Pedro Nunes, 3030-199, Coimbra, Portugal
| | - Rui M M Brito
- BSIM Therapeutics, Instituto Pedro Nunes, 3030-199, Coimbra, Portugal.,Department of Chemistry, Universidade de Coimbra, CQC-IMS, 3004-535, Coimbra, Portugal
| | - Bernardete Ribeiro
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, Portugal
| |
Collapse
|
7
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
8
|
Pereira T, Abbasi M, Oliveira RI, Guedes RA, Salvador JAR, Arrais JP. Deep generative model for therapeutic targets using transcriptomic disease-associated data-USP7 case study. Brief Bioinform 2022; 23:6628785. [PMID: 35789255 DOI: 10.1093/bib/bbac270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/24/2022] [Accepted: 06/09/2022] [Indexed: 12/24/2022] Open
Abstract
The generation of candidate hit molecules with the potential to be used in cancer treatment is a challenging task. In this context, computational methods based on deep learning have been employed to improve in silico drug design methodologies. Nonetheless, the applied strategies have focused solely on the chemical aspect of the generation of compounds, disregarding the likely biological consequences for the organism's dynamics. Herein, we propose a method to implement targeted molecular generation that employs biological information, namely, disease-associated gene expression data, to conduct the process of identifying interesting hits. When applied to the generation of USP7 putative inhibitors, the framework managed to generate promising compounds, with more than 90% of them containing drug-like properties and essential active groups for the interaction with the target. Hence, this work provides a novel and reliable method for generating new promising compounds focused on the biological context of the disease.
Collapse
Affiliation(s)
- Tiago Pereira
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Rita I Oliveira
- Laboratory of Pharmaceutical Chemistry Faculty of Pharmacy, Univ Coimbra, Coimbra, Portugal.,Center for Neuroscience and Cell Biology Center for Innovative Biomedicine and Biotechnology, Univ Coimbra, Coimbra, Portugal
| | - Romina A Guedes
- Laboratory of Pharmaceutical Chemistry Faculty of Pharmacy, Univ Coimbra, Coimbra, Portugal.,Center for Neuroscience and Cell Biology Center for Innovative Biomedicine and Biotechnology, Univ Coimbra, Coimbra, Portugal
| | - Jorge A R Salvador
- Laboratory of Pharmaceutical Chemistry Faculty of Pharmacy, Univ Coimbra, Coimbra, Portugal.,Center for Neuroscience and Cell Biology Center for Innovative Biomedicine and Biotechnology, Univ Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| |
Collapse
|
9
|
Abbasi M, Santos BP, Pereira TC, Sofia R, Monteiro NRC, Simões CJV, Brito R, Ribeiro B, Oliveira JL, Arrais JP. Designing optimized drug candidates with Generative Adversarial Network. J Cheminform 2022; 14:40. [PMID: 35754029 PMCID: PMC9233801 DOI: 10.1186/s13321-022-00623-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/13/2022] [Indexed: 12/03/2022] Open
Abstract
Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored to specific needs. However, stereochemistry was not explicitly considered in the generated molecules, which is inevitable in targeted-oriented molecules. This paper proposes a framework based on Feedback Generative Adversarial Network (GAN) that includes optimization strategy by incorporating Encoder-Decoder, GAN, and Predictor deep models interconnected with a feedback loop. The Encoder-Decoder converts the string notations of molecules into latent space vectors, effectively creating a new type of molecular representation. At the same time, the GAN can learn and replicate the training data distribution and, therefore, generate new compounds. The feedback loop is designed to incorporate and evaluate the generated molecules according to the multiobjective desired property at every epoch of training to ensure a steady shift of the generated distribution towards the space of the targeted properties. Moreover, to develop a more precise set of molecules, we also incorporate a multiobjective optimization selection technique based on a non-dominated sorting genetic algorithm. The results demonstrate that the proposed framework can generate realistic, novel molecules that span the chemical space. The proposed Encoder-Decoder model correctly reconstructs 99% of the datasets, including stereochemical information. The model's ability to find uncharted regions of the chemical space was successfully shown by optimizing the unbiased GAN to generate molecules with a high binding affinity to the Kappa Opioid and Adenosine [Formula: see text] receptor. Furthermore, the generated compounds exhibit high internal and external diversity levels 0.88 and 0.94, respectively, and uniqueness.
Collapse
Affiliation(s)
- Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Beatriz P. Santos
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Tiago C. Pereira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Raul Sofia
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Nelson R. C. Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | | | - Rui Brito
- BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| | - Bernardete Ribeiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L. Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P. Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
10
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
11
|
Monteiro NRC, Ribeiro B, Arrais JP. Drug-Target Interaction Prediction: End-to-End Deep Learning Approach. IEEE/ACM Trans Comput Biol Bioinform 2021; 18:2364-2374. [PMID: 32142454 DOI: 10.1109/tcbb.2020.2977335] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The discovery of potential Drug-Target Interactions (DTIs) is a determining step in the drug discovery and repositioning process, as the effectiveness of the currently available antibiotic treatment is declining. Although putting efforts on the traditional in vivo or in vitro methods, pharmaceutical financial investment has been reduced over the years. Therefore, establishing effective computational methods is decisive to find new leads in a reasonable amount of time. Successful approaches have been presented to solve this problem but seldom protein sequences and structured data are used together. In this paper, we present a deep learning architecture model, which exploits the particular ability of Convolutional Neural Networks (CNNs) to obtain 1D representations from protein sequences (amino acid sequence) and compounds SMILES (Simplified Molecular Input Line Entry System) strings. These representations can be interpreted as features that express local dependencies or patterns that can then be used in a Fully Connected Neural Network (FCNN), acting as a binary classifier. The results achieved demonstrate that using CNNs to obtain representations of the data, instead of the traditional descriptors, lead to improved performance. The proposed end-to-end deep learning method outperformed traditional machine learning approaches in the correct classification of both positive and negative interactions.
Collapse
|
12
|
Pereira T, Abbasi M, Ribeiro B, Arrais JP. Diversity oriented Deep Reinforcement Learning for targeted molecule generation. J Cheminform 2021; 13:21. [PMID: 33750461 PMCID: PMC7944916 DOI: 10.1186/s13321-021-00498-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/22/2021] [Indexed: 11/10/2022] Open
Abstract
In this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine [Formula: see text] and [Formula: see text] opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.
Collapse
Affiliation(s)
- Tiago Pereira
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Maryam Abbasi
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Bernardete Ribeiro
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Joel P. Arrais
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| |
Collapse
|
13
|
Cruz A, Machado P, Arrais JP. CroP-Coordinated Panel visualization for biological networks analysis. Bioinformatics 2020; 36:1298-1299. [PMID: 31504214 DOI: 10.1093/bioinformatics/btz688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 08/12/2019] [Accepted: 08/30/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY CroP is a data visualization application that focuses on the analysis of relational data that changes over time. While it was specifically designed for addressing the preeminent need to interpret large scale time series from gene expression studies, CroP is prepared to analyze datasets from multiple contexts. Multiple datasets can be uploaded simultaneously and viewed through dynamic visualization models, which are contained within flexible panels that allow users to adapt the workspace to their data. Through clustering and the time curve visualization it is possible to quickly identify groups of data points with similar proprieties or behaviors, as well as temporal patterns across all points, such as periodic waves of expression. Additionally, it integrates a public biomedical database for gene annotation. CroP will be of major interest to biologists who seek to extract relations from complex sets of data. AVAILABILITY AND IMPLEMENTATION CroP is freely available for download as an executable jar at https://cdv.dei.uc.pt/crop/.
Collapse
Affiliation(s)
- António Cruz
- CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra 3030-290, Portugal
| | - Penousal Machado
- CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra 3030-290, Portugal
| | - Joel P Arrais
- CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra 3030-290, Portugal
| |
Collapse
|
14
|
Cruz A, Arrais JP, Machado P. Interactive and coordinated visualization approaches for biological data analysis. Brief Bioinform 2019; 20:1513-1523. [PMID: 29590305 DOI: 10.1093/bib/bby019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 01/24/2018] [Indexed: 12/11/2022] Open
Abstract
The field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein-protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.
Collapse
Affiliation(s)
- António Cruz
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Joel P Arrais
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Penousal Machado
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| |
Collapse
|
15
|
Coelho ED, Arrais JP, Oliveira JL. Uncovering microbial duality within human microbiomes: A novel algorithm for the analysis of host-pathogen interactions. Annu Int Conf IEEE Eng Med Biol Soc 2016; 2015:3254-7. [PMID: 26736986 DOI: 10.1109/embc.2015.7319086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Microbial species thrive within human hosts by establishing complex associations between themselves and the host. Even though species diversity can be measured (alpha- and beta-diversity), a methodology to estimate the impact of microorganisms in human pathways is still lacking. In this work we propose a computational approach to estimate which human pathways are targeted the most by microorganisms, while also identifying which microorganisms are prominent in this targeting. Our results were consistent with literature evidence, and thus we propose this methodology as a new prospective approach to be used for screening potentially impacted pathways.
Collapse
|
16
|
Coelho ED, Santiago AM, Arrais JP, Oliveira JL. Computational methodology for predicting the landscape of the human–microbial interactome region level influence. J Bioinform Comput Biol 2015; 13:1550023. [DOI: 10.1142/s0219720015500237] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Microbial communities thrive in close association among themselves and with the host, establishing protein–protein interactions (PPIs) with the latter, and thus being able to benefit (positively impact) or disturb (negatively impact) biological events in the host. Despite major collaborative efforts to sequence the Human microbiome, there is still a great lack of understanding their impact. We propose a computational methodology to predict the impact of microbial proteins in human biological events, taking into account the abundance of each microbial protein and its relation to all other microbial and human proteins. This alternative methodology is centered on an improved impact estimation algorithm that integrates PPIs between human and microbial proteins with Reactome pathway data. This methodology was applied to study the impact of 24 microbial phyla over different cellular events, within 10 different human microbiomes. The results obtained confirm findings already described in the literature and explore new ones. We believe the Human microbiome can no longer be ignored as not only is there enough evidence correlating microbiome alterations and disease states, but also the return to healthy states once these alterations are reversed.
Collapse
Affiliation(s)
- Edgar D. Coelho
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - André M. Santiago
- Department of Informatics Engineering (DEI), Centre for Informatics and Systems of the University of Coimbra (CISUC), University of Coimbra, Polo2, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
| | - Joel P. Arrais
- Department of Informatics Engineering (DEI), Centre for Informatics and Systems of the University of Coimbra (CISUC), University of Coimbra, Polo2, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
| | - José Luís Oliveira
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| |
Collapse
|
17
|
Coelho ED, Arrais JP, Matos S, Pereira C, Rosa N, Correia MJ, Barros M, Oliveira JL. Computational prediction of the human-microbial oral interactome. BMC Syst Biol 2014; 8:24. [PMID: 24576332 PMCID: PMC3975954 DOI: 10.1186/1752-0509-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/17/2014] [Indexed: 11/12/2022]
Abstract
BACKGROUND The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome. RESULTS We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored. CONCLUSIONS We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.
Collapse
Affiliation(s)
- Edgar D Coelho
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Department of Informatics Engineering (DEI), University of Coimbra, Coimbra, Portugal
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Carlos Pereira
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
- Department of Informatics Engineering and Systems, Polytechnic Institute of Coimbra, Engineering Institute of Coimbra (IPC-ISEC), Coimbra, Portugal
| | - Nuno Rosa
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Maria José Correia
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Marlene Barros
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
- Centre for Neurosciences and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - José Luís Oliveira
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| |
Collapse
|
18
|
Reboiro-Jato M, Arrais JP, Oliveira JL, Fdez-Riverola F. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification. BMC Bioinformatics 2014; 15:31. [PMID: 24475928 PMCID: PMC3909759 DOI: 10.1186/1471-2105-15-31] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2012] [Accepted: 01/27/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. RESULTS geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. CONCLUSIONS geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Collapse
Affiliation(s)
| | | | | | - Florentino Fdez-Riverola
- Escuela Superior de Ingeniería Informática, Universidade de Vigo, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain.
| |
Collapse
|
19
|
D. Coelho E, P. Arrais J, Luis Oliveira J. From Protein-Protein Interactions to Rational Drug Design: Are Computational Methods Up to the Challenge?. Curr Top Med Chem 2013; 13:602-18. [DOI: 10.2174/1568026611313050005] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 02/15/2013] [Accepted: 03/09/2013] [Indexed: 11/22/2022]
|
20
|
Arrais JP, Fernandes J, Pereira J, Oliveira JL. GeneBrowser 2: an application to explore and identify common biological traits in a set of genes. BMC Bioinformatics 2010; 11:389. [PMID: 20663121 PMCID: PMC2919517 DOI: 10.1186/1471-2105-11-389] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 07/21/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The development of high-throughput laboratory techniques created a demand for computer-assisted result analysis tools. Many of these techniques return lists of genes whose interpretation requires finding relevant biological roles for the problem at hand. The required information is typically available in public databases, and usually, this information must be manually retrieved to complement the analysis. This process is a very time-consuming task that should be automated as much as possible. RESULTS GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualisation and analysis methods to help identify the most relevant and common biological characteristics. The functionalities provided include the following: a central point with the most relevant biological information for each inserted gene; a list of the most related papers in PubMed and gene expression studies in ArrayExpress; and an extended approach to functional analysis applied to Gene Ontology, homologies, gene chromosomal localisation and pathways. CONCLUSIONS GeneBrowser provides a unique entry point to several visualisation and analysis methods, providing fast and easy analysis of a set of genes. GeneBrowser fills the gap between Web portals that analyse one gene at a time and functional analysis tools that are limited in scope and usually desktop-based.
Collapse
Affiliation(s)
- Joel P Arrais
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
| | | | | | | |
Collapse
|
21
|
Matos S, Arrais JP, Maia-Rodrigues J, Oliveira JL. Concept-based query expansion for retrieving gene related publications from MEDLINE. BMC Bioinformatics 2010; 11:212. [PMID: 20426836 PMCID: PMC2873540 DOI: 10.1186/1471-2105-11-212] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 04/28/2010] [Indexed: 11/10/2022] Open
Abstract
Background Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the various existing biomedical databases, the scientific literature is still the major source of information. There is, therefore, a growing need for text mining and information retrieval tools to help researchers find the relevant articles for their study. To tackle this, several tools have been proposed to provide alternative solutions for specific user requests. Results This paper presents QuExT, a new PubMed-based document retrieval and prioritization tool that, from a given list of genes, searches for the most relevant results from the literature. QuExT follows a concept-oriented query expansion methodology to find documents containing concepts related to the genes in the user input, such as protein and pathway names. The retrieved documents are ranked according to user-definable weights assigned to each concept class. By changing these weights, users can modify the ranking of the results in order to focus on documents dealing with a specific concept. The method's performance was evaluated using data from the 2004 TREC genomics track, producing a mean average precision of 0.425, with an average of 4.8 and 31.3 relevant documents within the top 10 and 100 retrieved abstracts, respectively. Conclusions QuExT implements a concept-based query expansion scheme that leverages gene-related information available on a variety of biological resources. The main advantage of the system is to give the user control over the ranking of the results by means of a simple weighting scheme. Using this approach, researchers can effortlessly explore the literature regarding a group of genes and focus on the different aspects relating to these genes.
Collapse
Affiliation(s)
- Sérgio Matos
- Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
| | | | | | | |
Collapse
|