101
|
Xu X, Yue L, Li B, Liu Y, Wang Y, Zhang W, Wang L. DSGAT: predicting frequencies of drug side effects by graph attention networks. Brief Bioinform 2022; 23:6511198. [DOI: 10.1093/bib/bbab586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/01/2021] [Accepted: 12/20/2021] [Indexed: 12/22/2022] Open
Abstract
Abstract
A critical issue of drug risk–benefit evaluation is to determine the frequencies of drug side effects. Randomized controlled trail is the conventional method for obtaining the frequencies of side effects, while it is laborious and slow. Therefore, it is necessary to guide the trail by computational methods. Existing methods for predicting the frequencies of drug side effects focus on modeling drug–side effect interaction graph. The inherent disadvantage of these approaches is that their performance is closely linked to the density of interactions but which is highly sparse. More importantly, for a cold start drug that does not appear in the training data, such methods cannot learn the preference embedding of the drug because there is no link to the drug in the interaction graph. In this work, we propose a new method for predicting the frequencies of drug side effects, DSGAT, by using the drug molecular graph instead of the commonly used interaction graph. This leads to the ability to learn embeddings for cold start drugs with graph attention networks. The proposed novel loss function, i.e. weighted $\varepsilon$-insensitive loss function, could alleviate the sparsity problem. Experimental results on one benchmark dataset demonstrate that DSGAT yields significant improvement for cold start drugs and outperforms the state-of-the-art performance in the warm start scenario. Source code and datasets are available at https://github.com/xxy45/DSGAT.
Collapse
|
102
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
103
|
Zhang P, Wei Z, Che C, Jin B. DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug-Target interaction prediction. Comput Biol Med 2022; 142:105214. [PMID: 35030496 DOI: 10.1016/j.compbiomed.2022.105214] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 12/26/2021] [Accepted: 01/02/2022] [Indexed: 12/29/2022]
Abstract
Drug-target interaction (DTI) prediction reduces the cost and time of drug development, and plays a vital role in drug discovery. However, most of research does not fully explore the molecular structures of drug compounds in DTI prediction. To this end, we propose a deep learning model to capture the molecular structure information of drug compounds for DTI prediction. This model utilizes a transformer network incorporating multilayer graph information, which captures the features of a drug's molecular structure so that the interactions between atoms of drug compounds can be explored more deeply. At the same time, a convolutional neural network is employed to capture the local residue information in the target sequence, and effectively extract the feature information of the target. The experiments on the DrugBank dataset showed that the proposed model outperformed previous models based on the structure of target sequences. The results indicate that the improved transformer network fuses the feature information between layers in the graph convolutional neural network and extracts the interaction data for the molecular structure. The drug repositioning experiment on COVID-19 and Alzheimer's disease demonstrated the proposed model's ability to find therapeutic drugs in drug discovery. The code of our model is available at https://github.com/zhangpl109/DeepMGT-DTI.
Collapse
Affiliation(s)
- Peiliang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing (Dalian University), Ministry of Education, Dalian, 116622, China.
| | - Ziqi Wei
- School of Software, Tsinghua University, Beijing, 100084, China.
| | - Chao Che
- Key Laboratory of Advanced Design and Intelligent Computing (Dalian University), Ministry of Education, Dalian, 116622, China.
| | - Bo Jin
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
104
|
Zhu Y, Ouyang Z, Chen W, Feng R, Chen DZ, Cao J, Wu J. TGSA: protein-protein association-based twin graph neural networks for drug response prediction with similarity augmentation. Bioinformatics 2022; 38:461-468. [PMID: 34559177 DOI: 10.1093/bioinformatics/btab650] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/16/2021] [Accepted: 09/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug response prediction (DRP) plays an important role in precision medicine (e.g. for cancer analysis and treatment). Recent advances in deep learning algorithms make it possible to predict drug responses accurately based on genetic profiles. However, existing methods ignore the potential relationships among genes. In addition, similarity among cell lines/drugs was rarely considered explicitly. RESULTS We propose a novel DRP framework, called TGSA, to make better use of prior domain knowledge. TGSA consists of Twin Graph neural networks for Drug Response Prediction (TGDRP) and a Similarity Augmentation (SA) module to fuse fine-grained and coarse-grained information. Specifically, TGDRP abstracts cell lines as graphs based on STRING protein-protein association networks and uses Graph Neural Networks (GNNs) for representation learning. SA views DRP as an edge regression problem on a heterogeneous graph and utilizes GNNs to smooth the representations of similar cell lines/drugs. Besides, we introduce an auxiliary pre-training strategy to remedy the identified limitations of scarce data and poor out-of-distribution generalization. Extensive experiments on the GDSC2 dataset demonstrate that our TGSA consistently outperforms all the state-of-the-art baselines under various experimental settings. We further evaluate the effectiveness and contributions of each component of TGSA via ablation experiments. The promising performance of TGSA shows enormous potential for clinical applications in precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/violet-sto/TGSA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiheng Zhu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Zhenqiu Ouyang
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Wenbo Chen
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Ruiwei Feng
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Ji Cao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310000, China
| | - Jian Wu
- Department of Ophthalmology of the Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310000, China
| |
Collapse
|
105
|
Mahajan RA, Shaikh NK, Tikhe TB, Vyas R, Chavan SM. Hybrid Sea Lion Crow Search Algorithm-Based Stacked Autoencoder for Drug Sensitivity Prediction From Cancer Cell Lines. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.304723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cancer is the most dreadful diseases across world and providing better therapy to cancer patients is still remains as a major challenging task due to drug resistance of tumor cells. This paper proposes a Sea Lion Crow Search Algorithm (SLCSA) for drug sensitivity prediction. The drug sensitivity from cultured cell lines is predicted using stacked autoencoder and proposed SLCSA is derived by combination of Sea Lion Optimization (SLnO) and Crow Search Algorithm (CSA).The implemented approach has offered superior results with maximum value of testing accuracy for normal are 0.920, leukemia is 0.920, NSCLC is 0.912, and urogenital is 0.914.
Collapse
|
106
|
Yang X, Wang W, Ma JL, Qiu YL, Lu K, Cao DS, Wu CK. BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution. Brief Bioinform 2021; 23:6440126. [PMID: 34849567 PMCID: PMC8690188 DOI: 10.1093/bib/bbab491] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/24/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023] Open
Abstract
Motivation Understanding chemical–gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. Results We developed BioNet, a deep biological networkmodel with a graph encoder–decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.
Collapse
Affiliation(s)
- Xi Yang
- College of Computer, National University of Defense Technology, China
| | - Wei Wang
- National Supercomputer Center in Tianjin, China
| | - Jing-Lun Ma
- College of Computer, National University of Defense Technology, China
| | - Yan-Long Qiu
- College of Computer, National University of Defense Technology, China
| | - Kai Lu
- College of Computer, National University of Defense Technology, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Cheng-Kun Wu
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
| |
Collapse
|
107
|
Liu X, Song C, Huang F, Fu H, Xiao W, Zhang W. GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 2021; 23:6415314. [PMID: 34727569 DOI: 10.1093/bib/bbab457] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 09/25/2021] [Accepted: 10/07/2021] [Indexed: 12/29/2022] Open
Abstract
Predicting the response of a cancer cell line to a therapeutic drug is an important topic in modern oncology that can help personalized treatment for cancers. Although numerous machine learning methods have been developed for cancer drug response (CDR) prediction, integrating diverse information about cancer cell lines, drugs and their known responses still remains a great challenge. In this paper, we propose a graph neural network method with contrastive learning for CDR prediction. GraphCDR constructs a graph neural network based on multi-omics profiles of cancer cell lines, the chemical structure of drugs and known cancer cell line-drug responses for CDR prediction, while a contrastive learning task is presented as a regularizer within a multi-task learning paradigm to enhance the generalization ability. In the computational experiments, GraphCDR outperforms state-of-the-art methods under different experimental configurations, and the ablation study reveals the key components of GraphCDR: biological features, known cancer cell line-drug responses and contrastive learning are important for the high-accuracy CDR prediction. The experimental analyses imply the predictive power of GraphCDR and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Congzhi Song
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wenjie Xiao
- Information School, University of Washington, Washington, 98105, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
108
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
109
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
110
|
Zuo Z, Wang P, Chen X, Tian L, Ge H, Qian D. SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures. BMC Bioinformatics 2021; 22:434. [PMID: 34507532 PMCID: PMC8434731 DOI: 10.1186/s12859-021-04352-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 08/31/2021] [Indexed: 12/13/2022] Open
Abstract
Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04352-9.
Collapse
Affiliation(s)
- Zhaorui Zuo
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Penglei Wang
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Xiaowei Chen
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Li Tian
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Hui Ge
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China.
| | - Dahong Qian
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China.
| |
Collapse
|
111
|
Peng W, Chen T, Dai W. Predicting Drug Response Based on Multi-omics Fusion and Graph Convolution. IEEE J Biomed Health Inform 2021; 26:1384-1393. [PMID: 34347616 DOI: 10.1109/jbhi.2021.3102186] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Different cancer patients may respond differently to cancer treatment due to the heterogeneity of cancer. It is an urgent task to develop an efficient computational method to identify drug responses in different cell lines, which guides us to design personalized therapy for an individual patient. Hence, we propose an end-to-end algorithm, namely MOFGCN, to predict drug response in cell lines based on Multi-Omics Fusion and Graph Convolution Network. MOFGCN first fuses multiple omics data to calculate the cell line similarity and then constructs a heterogeneous network by combining the cell line similarity, drug similarity, and the known cell line-drug associations. Secondly, it learns the latent features for cancer cell lines and drugs by performing graph convolution operations on the heterogeneous network. Finally, MOFGCN applies the linear correlation coefficient to reconstruct the cancer cell line-drug correlation matrix to predict drug sensitivity. To our knowledge, this is the first attempt to combine graph convolutional neural network and linear correlation coefficient for this significant task. We performed extensive evaluation experiments on the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) databases to validate MOFGCNs performance. The experimental results show that MOFGCN is superior to the state-of-the-art algorithms in predicting missing drug responses. It also leads to higher performance in predicting drug responses for new cell lines, new drugs, and targeted drugs.
Collapse
|
112
|
McGrath SP, Benton ML, Tavakoli M, Tatonetti NP. Predictions, Pivots, and a Pandemic: a Review of 2020's Top Translational Bioinformatics Publications. Yearb Med Inform 2021; 30:219-225. [PMID: 34479393 PMCID: PMC8416221 DOI: 10.1055/s-0041-1726540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES Provide an overview of the emerging themes and notable papers which were published in 2020 in the field of Bioinformatics and Translational Informatics (BTI) for the International Medical Informatics Association Yearbook. METHODS A team of 16 individuals scanned the literature from the past year. Using a scoring rubric, papers were evaluated on their novelty, importance, and objective quality. 1,224 Medical Subject Headings (MeSH) terms extracted from these papers were used to identify themes and research focuses. The authors then used the scoring results to select notable papers and trends presented in this manuscript. RESULTS The search phase identified 263 potential papers and central themes of coronavirus disease 2019 (COVID-19), machine learning, and bioinformatics were examined in greater detail. CONCLUSIONS When addressing a once in a centruy pandemic, scientists worldwide answered the call, with informaticians playing a critical role. Productivity and innovations reached new heights in both TBI and science, but significant research gaps remain.
Collapse
Affiliation(s)
- Scott P. McGrath
- CITRIS Health, University of California Berkeley, Berkeley, CA, USA
| | | | - Maryam Tavakoli
- MTERMS Lab, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
113
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 85] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
114
|
Hinnerichs T, Hoehndorf R. DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug-target interactions. Bioinformatics 2021; 37:4835-4843. [PMID: 34320178 PMCID: PMC8665763 DOI: 10.1093/bioinformatics/btab548] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/14/2021] [Accepted: 07/26/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In silico drug-target interaction (DTI) prediction is important for drug discovery and drug repurposing. Approaches to predict DTIs can proceed indirectly, top-down, using phenotypic effects of drugs to identify potential drug targets, or they can be direct, bottom-up and use molecular information to directly predict binding affinities. Both approaches can be combined with information about interaction networks. RESULTS We developed DTI-Voodoo as a computational method that combines molecular features and ontology-encoded phenotypic effects of drugs with protein-protein interaction networks, and uses a graph convolutional neural network to predict DTIs. We demonstrate that drug effect features can exploit information in the interaction network whereas molecular features do not. DTI-Voodoo is designed to predict candidate drugs for a given protein; we use this formulation to show that common DTI datasets contain intrinsic biases with major effects on performance evaluation and comparison of DTI prediction methods. Using a modified evaluation scheme, we demonstrate that DTI-Voodoo improves significantly over state of the art DTI prediction methods. AVAILABILITY DTI-Voodoo source code and data necessary to reproduce results are freely available at https://github.com/THinnerichs/DTI-VOODOO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tilman Hinnerichs
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
115
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 246] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|