151
|
Zeibich R, Kwan P, J. O’Brien T, Perucca P, Ge Z, Anderson A. Applications for Deep Learning in Epilepsy Genetic Research. Int J Mol Sci 2023; 24:14645. [PMID: 37834093 PMCID: PMC10572791 DOI: 10.3390/ijms241914645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
Epilepsy is a group of brain disorders characterised by an enduring predisposition to generate unprovoked seizures. Fuelled by advances in sequencing technologies and computational approaches, more than 900 genes have now been implicated in epilepsy. The development and optimisation of tools and methods for analysing the vast quantity of genomic data is a rapidly evolving area of research. Deep learning (DL) is a subset of machine learning (ML) that brings opportunity for novel investigative strategies that can be harnessed to gain new insights into the genomic risk of people with epilepsy. DL is being harnessed to address limitations in accuracy of long-read sequencing technologies, which improve on short-read methods. Tools that predict the functional consequence of genetic variation can represent breaking ground in addressing critical knowledge gaps, while methods that integrate independent but complimentary data enhance the predictive power of genetic data. We provide an overview of these DL tools and discuss how they may be applied to the analysis of genetic data for epilepsy research.
Collapse
Affiliation(s)
- Robert Zeibich
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
| | - Patrick Kwan
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Terence J. O’Brien
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Piero Perucca
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
- Bladin-Berkovic Comprehensive Epilepsy Program, Department of Neurology, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
| | - Zongyuan Ge
- Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia;
- Monash-Airdoc Research, Monash University, Melbourne, VIC 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| |
Collapse
|
152
|
Vaculík O, Chalupová E, Grešová K, Majtner T, Alexiou P. Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes. BIOLOGY 2023; 12:1276. [PMID: 37886986 PMCID: PMC10604046 DOI: 10.3390/biology12101276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]
Abstract
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.
Collapse
Affiliation(s)
- Ondřej Vaculík
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Katarína Grešová
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60439 Frankfurt am Main, Germany
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, MSD 2080 Msida, Malta
- Centre for Molecular Medicine & Biobanking, University of Malta, MSD 2080 Msida, Malta
| |
Collapse
|
153
|
Zhang Q, Cao L, Song H, Lin K, Pang E. MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome. Brief Bioinform 2023; 24:bbad367. [PMID: 37833843 PMCID: PMC10576019 DOI: 10.1093/bib/bbad367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/31/2023] [Accepted: 09/26/2023] [Indexed: 10/15/2023] Open
Abstract
Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field.
Collapse
Affiliation(s)
- Quanbao Zhang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Lei Cao
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
154
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
155
|
Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023; 24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open
Abstract
RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
Collapse
Affiliation(s)
- Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Germany
- School of Computation, Information and Technology, Technical University Munich (TUM), Germany
| | - Giulia Cantini
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Julian Hesse
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Patrick Schinke
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Nicolas Goedert
- Computational Health Center, Helmholtz Center Munich, Germany
| | | | - Lambert Moyon
- Computational Health Center, Helmholtz Center Munich, Germany
| | | |
Collapse
|
156
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 PMCID: PMC11554572 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
157
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
158
|
Halawani R, Buchert M, Chen YPP. Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity. Comput Biol Med 2023; 164:107274. [PMID: 37506451 DOI: 10.1016/j.compbiomed.2023.107274] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 07/03/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
Tumour heterogeneity is one of the critical confounding aspects in decoding tumour growth. Malignant cells display variations in their gene transcription profiles and mutation spectra even when originating from a single progenitor cell. Single-cell and spatial transcriptomics sequencing have recently emerged as key technologies for unravelling tumour heterogeneity. Single-cell sequencing promotes individual cell-type identification through transcriptome-wide gene expression measurements of each cell. Spatial transcriptomics facilitates identification of cell-cell interactions and the structural organization of heterogeneous cells within a tumour tissue through associating spatial RNA abundance of cells at distinct spots in the tissue section. However, extracting features and analyzing single-cell and spatial transcriptomics data poses challenges. Single-cell transcriptome data is extremely noisy and its sparse nature and dropouts can lead to misinterpretation of gene expression and the misclassification of cell types. Deep learning predictive power can overcome data challenges, provide high-resolution analysis and enhance precision oncology applications that involve early cancer prognosis, diagnosis, patient survival estimation and anti-cancer therapy planning. In this paper, we provide a background to and review of the recent progress of deep learning frameworks to investigate tumour heterogeneity using both single-cell and spatial transcriptomics data types.
Collapse
Affiliation(s)
- Raid Halawani
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Michael Buchert
- School of Cancer Medicine, La Trobe University, Melbourne, Victoria, Australia; Olivia Newton-John Cancer Research Institute, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| |
Collapse
|
159
|
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works.
Collapse
|
160
|
Guo W, Hu Y, Qian J, Zhu L, Cheng J, Liao J, Fan X. Laser capture microdissection for biomedical research: towards high-throughput, multi-omics, and single-cell resolution. J Genet Genomics 2023; 50:641-651. [PMID: 37544594 DOI: 10.1016/j.jgg.2023.07.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/28/2023] [Accepted: 07/28/2023] [Indexed: 08/08/2023]
Abstract
Spatial omics technologies have become powerful methods to provide valuable insights into cells and tissues within a complex context, significantly enhancing our understanding of the intricate and multifaceted biological system. With an increasing focus on spatial heterogeneity, there is a growing need for unbiased, spatially resolved omics technologies. Laser capture microdissection (LCM) is a cutting-edge method for acquiring spatial information that can quickly collect regions of interest (ROIs) from heterogeneous tissues, with resolutions ranging from single cells to cell populations. Thus, LCM has been widely used for studying the cellular and molecular mechanisms of diseases. This review focuses on the differences among four types of commonly used LCM technologies and their applications in omics and disease research. Key attributes of application cases are also highlighted, such as throughput and spatial resolution. In addition, we comprehensively discuss the existing challenges and the great potential of LCM in biomedical research, disease diagnosis, and targeted therapy from the perspective of high-throughput, multi-omics, and single-cell resolution.
Collapse
Affiliation(s)
- Wenbo Guo
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China
| | - Yining Hu
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China
| | - Jingyang Qian
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China
| | - Lidan Zhu
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China
| | - Junyun Cheng
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China
| | - Jie Liao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, Zhejiang 314100, China.
| |
Collapse
|
161
|
Li J, Kang G, Wang J, Yuan H, Wu Y, Meng S, Wang P, Zhang M, Wang Y, Feng Y, Huang H, de Marco A. Affinity maturation of antibody fragments: A review encompassing the development from random approaches to computational rational optimization. Int J Biol Macromol 2023; 247:125733. [PMID: 37423452 DOI: 10.1016/j.ijbiomac.2023.125733] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 07/11/2023]
Abstract
Routinely screened antibody fragments usually require further in vitro maturation to achieve the desired biophysical properties. Blind in vitro strategies can produce improved ligands by introducing random mutations into the original sequences and selecting the resulting clones under more and more stringent conditions. Rational approaches exploit an alternative perspective that aims first at identifying the specific residues potentially involved in the control of biophysical mechanisms, such as affinity or stability, and then to evaluate what mutations could improve those characteristics. The understanding of the antigen-antibody interactions is instrumental to develop this process the reliability of which, consequently, strongly depends on the quality and completeness of the structural information. Recently, methods based on deep learning approaches critically improved the speed and accuracy of model building and are promising tools for accelerating the docking step. Here, we review the features of the available bioinformatic instruments and analyze the reports illustrating the result obtained with their application to optimize antibody fragments, and nanobodies in particular. Finally, the emerging trends and open questions are summarized.
Collapse
Affiliation(s)
- Jiaqi Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Guangbo Kang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Jiewen Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Haibin Yuan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Yili Wu
- Zhejiang Provincial Clinical Research Center for Mental Disorders, School of Mental Health and the Affiliated Kangning Hospital, Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Wenzhou Medical University, Oujiang Laboratory, Wenzhou, Zhejiang 325035, China
| | - Shuxian Meng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Ping Wang
- New Technology R&D Department, Tianjin Modern Innovative TCM Technology Company Limited, Tianjin 300392, China
| | - Miao Zhang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; China Resources Biopharmaceutical Company Limited, Beijing 100029, China
| | - Yuli Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Tianjin Pharmaceutical Da Ren Tang Group Corporation Limited, Traditional Chinese Pharmacy Research Institute, Tianjin Key Laboratory of Quality Control in Chinese Medicine, Tianjin 300457, China; State Key Laboratory of Drug Delivery Technology and Pharmacokinetics, Tianjin Institute of Pharmaceutical Research, Tianjin 300193, China
| | - Yuanhang Feng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - He Huang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
| | - Ario de Marco
- Laboratory for Environmental and Life Sciences, University of Nova Gorica, Nova Gorica, Slovenia.
| |
Collapse
|
162
|
Froń A, Semianiuk A, Lazuk U, Ptaszkowski K, Siennicka A, Lemiński A, Krajewski W, Szydełko T, Małkiewicz B. Artificial Intelligence in Urooncology: What We Have and What We Expect. Cancers (Basel) 2023; 15:4282. [PMID: 37686558 PMCID: PMC10486651 DOI: 10.3390/cancers15174282] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/15/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
INTRODUCTION Artificial intelligence is transforming healthcare by driving innovation, automation, and optimization across various fields of medicine. The aim of this study was to determine whether artificial intelligence (AI) techniques can be used in the diagnosis, treatment planning, and monitoring of urological cancers. METHODOLOGY We conducted a thorough search for original and review articles published until 31 May 2022 in the PUBMED/Scopus database. Our search included several terms related to AI and urooncology. Articles were selected with the consensus of all authors. RESULTS Several types of AI can be used in the medical field. The most common forms of AI are machine learning (ML), deep learning (DL), neural networks (NNs), natural language processing (NLP) systems, and computer vision. AI can improve various domains related to the management of urologic cancers, such as imaging, grading, and nodal staging. AI can also help identify appropriate diagnoses, treatment options, and even biomarkers. In the majority of these instances, AI is as accurate as or sometimes even superior to medical doctors. CONCLUSIONS AI techniques have the potential to revolutionize the diagnosis, treatment, and monitoring of urologic cancers. The use of AI in urooncology care is expected to increase in the future, leading to improved patient outcomes and better overall management of these tumors.
Collapse
Affiliation(s)
- Anita Froń
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| | - Alina Semianiuk
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| | - Uladzimir Lazuk
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| | - Kuba Ptaszkowski
- Department of Physiotherapy, Wroclaw Medical University, 50-368 Wroclaw, Poland;
| | - Agnieszka Siennicka
- Department of Physiology and Pathophysiology, Wroclaw Medical University, 50-556 Wroclaw, Poland;
| | - Artur Lemiński
- Department of Urology and Urological Oncology, Pomeranian Medical University, 70-111 Szczecin, Poland;
| | - Wojciech Krajewski
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| | - Tomasz Szydełko
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| | - Bartosz Małkiewicz
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wroclaw Medical University, 50-556 Wroclaw, Poland; (A.S.); (U.L.); (W.K.); (T.S.)
| |
Collapse
|
163
|
Shan D, Wang J, Qi P, Lu J, Wang D. Non-Contrasted CT Radiomics for SAH Prognosis Prediction. Bioengineering (Basel) 2023; 10:967. [PMID: 37627852 PMCID: PMC10451737 DOI: 10.3390/bioengineering10080967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 08/27/2023] Open
Abstract
Subarachnoid hemorrhage (SAH) denotes a serious type of hemorrhagic stroke that often leads to a poor prognosis and poses a significant socioeconomic burden. Timely assessment of the prognosis of SAH patients is of paramount clinical importance for medical decision making. Currently, clinical prognosis evaluation heavily relies on patients' clinical information, which suffers from limited accuracy. Non-contrast computed tomography (NCCT) is the primary diagnostic tool for SAH. Radiomics, an emerging technology, involves extracting quantitative radiomics features from medical images to serve as diagnostic markers. However, there is a scarcity of studies exploring the prognostic prediction of SAH using NCCT radiomics features. The objective of this study is to utilize machine learning (ML) algorithms that leverage NCCT radiomics features for the prognostic prediction of SAH. Retrospectively, we collected NCCT and clinical data of SAH patients treated at Beijing Hospital between May 2012 and November 2022. The modified Rankin Scale (mRS) was utilized to assess the prognosis of patients with SAH at the 3-month mark after the SAH event. Based on follow-up data, patients were classified into two groups: good outcome (mRS ≤ 2) and poor outcome (mRS > 2) groups. The region of interest in NCCT images was delineated using 3D Slicer software, and radiomic features were extracted. The most stable and significant radiomic features were identified using the intraclass correlation coefficient, t-test, and least absolute shrinkage and selection operator (LASSO) regression. The data were randomly divided into training and testing cohorts in a 7:3 ratio. Various ML algorithms were utilized to construct predictive models, encompassing logistic regression (LR), support vector machine (SVM), random forest (RF), light gradient boosting machine (LGBM), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), and multi-layer perceptron (MLP). Seven prediction models based on radiomic features related to the outcome of SAH patients were constructed using the training cohort. Internal validation was performed using five-fold cross-validation in the entire training cohort. The receiver operating characteristic curve, accuracy, precision, recall, and f-1 score evaluation metrics were employed to assess the performance of the classifier in the overall dataset. Furthermore, decision curve analysis was conducted to evaluate model effectiveness. The study included 105 SAH patients. A comprehensive set of 1316 radiomics characteristics were initially derived, from which 13 distinct features were chosen for the construction of the ML model. Significant differences in age were observed between patients with good and poor outcomes. Among the seven constructed models, model_SVM exhibited optimal outcomes during a five-fold cross-validation assessment, with an average area under the curve (AUC) of 0.98 (standard deviation: 0.01) and 0.88 (standard deviation: 0.08) on the training and testing cohorts, respectively. In the overall dataset, model_SVM achieved an accuracy, precision, recall, f-1 score, and AUC of 0.88, 0.84, 0.87, 0.84, and 0.82, respectively, in the testing cohort. Radiomics features associated with the outcome of SAH patients were successfully obtained, and seven ML models were constructed. Model_SVM exhibited the best predictive performance. The radiomics model has the potential to provide guidance for SAH prognosis prediction and treatment guidance.
Collapse
Affiliation(s)
- Dezhi Shan
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China; (D.S.)
- Graduate School, Peking Union Medical College, Beijing 100730, China
| | - Junjie Wang
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China; (D.S.)
| | - Peng Qi
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China; (D.S.)
| | - Jun Lu
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China; (D.S.)
| | - Daming Wang
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China; (D.S.)
- Graduate School, Peking Union Medical College, Beijing 100730, China
| |
Collapse
|
164
|
Hepkema J, Lee NK, Stewart BJ, Ruangroengkulrith S, Charoensawan V, Clatworthy MR, Hemberg M. Predicting the impact of sequence motifs on gene regulation using single-cell data. Genome Biol 2023; 24:189. [PMID: 37582793 PMCID: PMC10426127 DOI: 10.1186/s13059-023-03021-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression in multiple mouse tissues. Applying scover to distal enhancers identified using scATAC-seq from the developing human brain, we identify cell type-specific motif activities in distal enhancers. Scover can identify regulatory motifs and their importance from single-cell data where all parameters and outputs are easily interpretable.
Collapse
Affiliation(s)
- Jacob Hepkema
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Nicholas Keone Lee
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Siwat Ruangroengkulrith
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Varodom Charoensawan
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, 7310, Thailand
- Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.
- Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
165
|
Li L, Pu C, Jin N, Zhu L, Hu Y, Cascone P, Tao Y, Zhang H. Prediction of 5-year overall survival of tongue cancer based machine learning. BMC Oral Health 2023; 23:567. [PMID: 37574562 PMCID: PMC10423415 DOI: 10.1186/s12903-023-03255-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/27/2023] [Indexed: 08/15/2023] Open
Abstract
OBJECTIVE We aimed to develop a 5-year overall survival prediction model for patients with oral tongue squamous cell carcinoma based on machine learning methods. SUBJECTS AND METHODS The data were obtained from electronic medical records of 224 OTSCC patients at the PLA General Hospital. A five-year overall survival prediction model was constructed using logistic regression, Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine. Model performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. The output of the optimal model was explained using the Python package (SHapley Additive exPlanations, SHAP). RESULTS After passing through the grid search and secondary modeling, the Light Gradient Boosting Machine was the best prediction model (AUC = 0.860). As explained by SHapley Additive exPlanations, N-stage, age, systemic inflammation response index, positive lymph nodes, plasma fibrinogen, lymphocyte-to-monocyte ratio, neutrophil percentage, and T-stage could perform a 5-year overall survival prediction for OTSCC. The 5-year survival rate was 42%. CONCLUSION The Light Gradient Boosting Machine prediction model predicted 5-year overall survival in OTSCC patients, and this predictive tool has potential prognostic implications for patients with OTSCC.
Collapse
Affiliation(s)
- Liangbo Li
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Cheng Pu
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, Chengdu, China
- College of Veterinary Medicine, Sichuan Agricultural University, Sichuan, China
| | - Nenghao Jin
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Liang Zhu
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yanchun Hu
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, Chengdu, China
- College of Veterinary Medicine, Sichuan Agricultural University, Sichuan, China
| | - Piero Cascone
- Unicamillus International Meical University, Rome, Italy
| | - Ye Tao
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China.
| | - Haizhong Zhang
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China.
| |
Collapse
|
166
|
Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023; 6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.
Collapse
Affiliation(s)
- Remo Monti
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
- Digital Health-Machine Learning, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
| |
Collapse
|
167
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
168
|
Van den Broeck L, Bhosale DK, Song K, Fonseca de Lima CF, Ashley M, Zhu T, Zhu S, Van De Cotte B, Neyt P, Ortiz AC, Sikes TR, Aper J, Lootens P, Locke AM, De Smet I, Sozzani R. Functional annotation of proteins for signaling network inference in non-model species. Nat Commun 2023; 14:4654. [PMID: 37537196 PMCID: PMC10400656 DOI: 10.1038/s41467-023-40365-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/25/2023] [Indexed: 08/05/2023] Open
Abstract
Molecular biology aims to understand cellular responses and regulatory dynamics in complex biological systems. However, these studies remain challenging in non-model species due to poor functional annotation of regulatory proteins. To overcome this limitation, we develop a multi-layer neural network that determines protein functionality directly from the protein sequence. We annotate kinases and phosphatases in Glycine max. We use the functional annotations from our neural network, Bayesian inference principles, and high resolution phosphoproteomics to infer phosphorylation signaling cascades in soybean exposed to cold, and identify Glyma.10G173000 (TOI5) and Glyma.19G007300 (TOT3) as key temperature regulators. Importantly, the signaling cascade inference does not rely upon known kinase motifs or interaction data, enabling de novo identification of kinase-substrate interactions. Conclusively, our neural network shows generalization and scalability, as such we extend our predictions to Oryza sativa, Zea mays, Sorghum bicolor, and Triticum aestivum. Taken together, we develop a signaling inference approach for non-model species leveraging our predicted kinases and phosphatases.
Collapse
Affiliation(s)
- Lisa Van den Broeck
- Plant and Microbial Biology Department and NC Plant Sciences Initiative, North Carolina State University, Raleigh, NC, 27695, USA.
| | - Dinesh Kiran Bhosale
- Electrical and Computer Engineering Department, North Carolina State University, Raleigh, NC, 27695, USA
| | - Kuncheng Song
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Cássio Flavio Fonseca de Lima
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Michael Ashley
- Electrical and Computer Engineering Department, North Carolina State University, Raleigh, NC, 27695, USA
| | - Tingting Zhu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Shanshuo Zhu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Brigitte Van De Cotte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Pia Neyt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Anna C Ortiz
- USDA-ARS Soybean & Nitrogen Fixation Research Unit, Raleigh, NC, 27607, Belgium
| | - Tiffany R Sikes
- USDA-ARS Soybean & Nitrogen Fixation Research Unit, Raleigh, NC, 27607, Belgium
| | - Jonas Aper
- Protealis NV, Technologiepark-Zwijnaarde 94, 9052, Ghent, Belgium
| | - Peter Lootens
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), 9090, Melle, Belgium
| | - Anna M Locke
- USDA-ARS Soybean & Nitrogen Fixation Research Unit, Raleigh, NC, 27607, Belgium
- Department of Crop and Soil Sciences and NC Plant Sciences Initiative, North Carolina State University, Raleigh, NC, 27695, USA
| | - Ive De Smet
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052, Ghent, Belgium
- VIB Center for Plant Systems Biology, B-9052, Ghent, Belgium
| | - Rosangela Sozzani
- Plant and Microbial Biology Department and NC Plant Sciences Initiative, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|
169
|
Rösler W, Altenbuchinger M, Baeßler B, Beissbarth T, Beutel G, Bock R, von Bubnoff N, Eckardt JN, Foersch S, Loeffler CML, Middeke JM, Mueller ML, Oellerich T, Risse B, Scherag A, Schliemann C, Scholz M, Spang R, Thielscher C, Tsoukakis I, Kather JN. An overview and a roadmap for artificial intelligence in hematology and oncology. J Cancer Res Clin Oncol 2023; 149:7997-8006. [PMID: 36920563 PMCID: PMC10374829 DOI: 10.1007/s00432-023-04667-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 02/23/2023] [Indexed: 03/16/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is influencing our society on many levels and has broad implications for the future practice of hematology and oncology. However, for many medical professionals and researchers, it often remains unclear what AI can and cannot do, and what are promising areas for a sensible application of AI in hematology and oncology. Finally, the limits and perils of using AI in oncology are not obvious to many healthcare professionals. METHODS In this article, we provide an expert-based consensus statement by the joint Working Group on "Artificial Intelligence in Hematology and Oncology" by the German Society of Hematology and Oncology (DGHO), the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), and the Special Interest Group Digital Health of the German Informatics Society (GI). We provide a conceptual framework for AI in hematology and oncology. RESULTS First, we propose a technological definition, which we deliberately set in a narrow frame to mainly include the technical developments of the last ten years. Second, we present a taxonomy of clinically relevant AI systems, structured according to the type of clinical data they are used to analyze. Third, we show an overview of potential applications, including clinical, research, and educational environments with a focus on hematology and oncology. CONCLUSION Thus, this article provides a point of reference for hematologists and oncologists, and at the same time sets forth a framework for the further development and clinical deployment of AI in hematology and oncology in the future.
Collapse
Affiliation(s)
- Wiebke Rösler
- Department for Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Michael Altenbuchinger
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen, Germany
| | - Bettina Baeßler
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Tim Beissbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen, Germany
| | - Gernot Beutel
- Department for Hematology, Hemostasis, Oncology and Stem Cell Transplantation, Hannover Medical School, Hannover, Germany
| | - Robert Bock
- IMMS Institute for Microelectronics and Mechatronics Systems GmbH (NPO), Ilmenau, Germany
| | - Nikolas von Bubnoff
- Department of Hematology and Oncology, Medical Center, University of Schleswig Holstein, Campus Lübeck, Lübeck, Germany
| | - Jan-Niklas Eckardt
- Department of Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Else Kroener Fresenius Center for Digital Health (EFFZ), Technical University Dresden, Dresden, Germany
| | - Sebastian Foersch
- Institute of Pathology, University Medical Center Mainz, Mainz, Germany
| | - Chiara M L Loeffler
- Department of Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Else Kroener Fresenius Center for Digital Health (EFFZ), Technical University Dresden, Dresden, Germany
| | - Jan Moritz Middeke
- Department of Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Else Kroener Fresenius Center for Digital Health (EFFZ), Technical University Dresden, Dresden, Germany
| | | | - Thomas Oellerich
- Medizinische Klinik 2-Haematology/Oncology, University Hospital, Frankfurt am Main, Germany
| | - Benjamin Risse
- Computer Vision and Machine Learning Systems Group, Institute for Geoinformatics, University of Münster, Münster, Germany
| | - André Scherag
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital - Friedrich Schiller University, Jena, Germany
| | | | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | | | - Ioannis Tsoukakis
- Department of Hematology and Oncology, Sana Klinikum Offenbach, Offenbach, Germany
| | - Jakob Nikolas Kather
- Department of Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
- Else Kroener Fresenius Center for Digital Health (EFFZ), Technical University Dresden, Dresden, Germany.
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
170
|
Li Z, Portillo-Ledesma S, Schlick T. Techniques for and challenges in reconstructing 3D genome structures from 2D chromosome conformation capture data. Curr Opin Cell Biol 2023; 83:102209. [PMID: 37506571 PMCID: PMC10529954 DOI: 10.1016/j.ceb.2023.102209] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
Chromosome conformation capture technologies that provide frequency information for contacts between genomic regions have been crucial for increasing our understanding of genome folding and regulation. However, such data do not provide direct evidence of the spatial 3D organization of chromatin. In this opinion article, we discuss the development and application of computational methods to reconstruct chromatin 3D structures from experimental 2D contact data, highlighting how such modeling provides biological insights and can suggest mechanisms anchored to experimental data. By applying different reconstruction methods to the same contact data, we illustrate some state-of-the-art of these techniques and discuss our gene resolution approach based on Brownian dynamics and Monte Carlo sampling.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Stephanie Portillo-Ledesma
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, 10012, NY, USA; New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Room 340, Geography Building, 3663 North Zhongshan Road, Shanghai, 200122, China; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA.
| |
Collapse
|
171
|
Tan J, Shenker-Tauris N, Rodriguez-Hernaez J, Wang E, Sakellaropoulos T, Boccalatte F, Thandapani P, Skok J, Aifantis I, Fenyö D, Xia B, Tsirigos A. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat Biotechnol 2023; 41:1140-1150. [PMID: 36624151 PMCID: PMC10329734 DOI: 10.1038/s41587-022-01612-8] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 11/14/2022] [Indexed: 01/11/2023]
Abstract
Investigating how chromatin organization determines cell-type-specific gene expression remains challenging. Experimental methods for measuring three-dimensional chromatin organization, such as Hi-C, are costly and have technical limitations, restricting their broad application particularly in high-throughput genetic perturbations. We present C.Origami, a multimodal deep neural network that performs de novo prediction of cell-type-specific chromatin organization using DNA sequence and two cell-type-specific genomic features-CTCF binding and chromatin accessibility. C.Origami enables in silico experiments to examine the impact of genetic changes on chromatin interactions. We further developed an in silico genetic screening approach to assess how individual DNA elements may contribute to chromatin organization and to identify putative cell-type-specific trans-acting regulators that collectively determine chromatin architecture. Applying this approach to leukemia cells and normal T cells, we demonstrate that cell-type-specific in silico genetic screening, enabled by C.Origami, can be used to systematically discover novel chromatin regulation circuits in both normal and disease-related biological systems.
Collapse
Affiliation(s)
- Jimin Tan
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA
| | - Nina Shenker-Tauris
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA
| | - Eric Wang
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- The Jackson Laboratory for Genomics Medicine, Farmington, CT, USA
| | | | - Francesco Boccalatte
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
- Department of Women's and Children's Health, University of Padua, Padua, Italy
| | - Palaniraja Thandapani
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Jane Skok
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Iannis Aifantis
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - David Fenyö
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Bo Xia
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA.
- Society of Fellows, Harvard University, Cambridge, MA, USA.
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA.
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA.
| |
Collapse
|
172
|
Ma W, Fu Y, Bao Y, Wang Z, Lei B, Zheng W, Wang C, Liu Y. DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants. Int J Mol Sci 2023; 24:12023. [PMID: 37569400 PMCID: PMC10418434 DOI: 10.3390/ijms241512023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/13/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits.
Collapse
Affiliation(s)
- Wenlong Ma
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yang Fu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yongzhou Bao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- School of Life Sciences, Henan University, Kaifeng 475004, China
| | - Zhen Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- School of Life Sciences, Henan University, Kaifeng 475004, China
| | - Bowen Lei
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Weigang Zheng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Chao Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China; (W.M.); (Y.F.); (Y.B.); (Z.W.); (B.L.); (W.Z.); (C.W.)
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
173
|
Limeta A, Gatto F, Herrgård MJ, Ji B, Nielsen J. Leveraging high-resolution omics data for predicting responses and adverse events to immune checkpoint inhibitors. Comput Struct Biotechnol J 2023; 21:3912-3919. [PMID: 37602228 PMCID: PMC10432706 DOI: 10.1016/j.csbj.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 07/17/2023] [Accepted: 07/22/2023] [Indexed: 08/22/2023] Open
Abstract
A long-standing goal of personalized and precision medicine is to enable accurate prediction of the outcomes of a given treatment regimen for patients harboring a disease. Currently, many clinical trials fail to meet their endpoints due to underlying factors in the patient population that contribute to either poor responses to the drug of interest or to treatment-related adverse events. Identifying these factors beforehand and correcting for them can lead to an increased success of clinical trials. Comprehensive and large-scale data gathering efforts in biomedicine by omics profiling of the healthy and diseased individuals has led to a treasure-trove of host, disease and environmental factors that contribute to the effectiveness of drugs aiming to treat disease. With increasing omics data, artificial intelligence allows an in-depth analysis of big data and offers a wide range of applications for real-world clinical use, including improved patient selection and identification of actionable targets for companion therapeutics for improved translatability across more patients. As a blueprint for complex drug-disease-host interactions, we here discuss the challenges of utilizing omics data for predicting responses and adverse events in cancer immunotherapy with immune checkpoint inhibitors (ICIs). The omics-based methodologies for improving patient outcomes as in the ICI case have also been applied across a wide-range of complex disease settings, exemplifying the use of omics for in-depth disease profiling and clinical use.
Collapse
Affiliation(s)
- Angelo Limeta
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Francesco Gatto
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
- Department of Oncology-Pathology, Karolinska Institute, 171 64 Stockholm, Sweden
| | | | - Boyang Ji
- BioInnovation Institute, 2200 Copenhagen N, Denmark
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
- BioInnovation Institute, 2200 Copenhagen N, Denmark
| |
Collapse
|
174
|
Sigurdsson AI, Louloudis I, Banasik K, Westergaard D, Winther O, Lund O, Ostrowski S, Erikstrup C, Pedersen O, Nyegaard M, Brunak S, Vilhjálmsson B, Rasmussen S. Deep integrative models for large-scale human genomics. Nucleic Acids Res 2023; 51:e67. [PMID: 37224538 PMCID: PMC10325897 DOI: 10.1093/nar/gkad373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/18/2023] [Accepted: 04/28/2023] [Indexed: 05/26/2023] Open
Abstract
Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.
Collapse
Affiliation(s)
- Arnór I Sigurdsson
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ioannis Louloudis
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David Westergaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Ole Winther
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen 2100, Denmark
| | - Ole Lund
- Danish National Genome Center, Ørestads Boulevard 5, 2300 Copenhagen S, Denmark
- DTU Health Tech, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Sisse Rye Ostrowski
- Department of Clinical Immunology, Rigshospitalet, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, 8000 Aarhus C, Denmark
- Department of Clinical Medicine, Aarhus University, 8000 Aarhus C, Denmark
| | - Ole Birger Vesterager Pedersen
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Clinical Immunology, Zealand University Hospital, 4600 Køge, Denmark
| | - Mette Nyegaard
- Department of Health Science and Technology, Aalborg University, DK- 9260 Gistrup, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research (NCRR), Aarhus University, 8000 Aarhus C, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), 8210 Aarhus V, Denmark
- Bioinformatics Research Centre (BiRC), Aarhus University, 8000 Aarhus C, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
175
|
Heng SZW, Hoo R, Tan DSW. Coexisting Genomic Alterations in Risk Stratification of KRASG12C-Mutated Non-Small Cell Lung Cancer. Cancer Discov 2023; 13:1513-1515. [PMID: 37416990 DOI: 10.1158/2159-8290.cd-23-0489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
SUMMARY Negrao and colleagues showed that coalterations in three genes-KEAP1, SMARCA4, and CDKN2A- correlated to poor clinical outcomes in patients with KRASG12C-mutated non-small cell lung cancer treated with sotorasib or adagrasib. Their study highlights how pooling high-resolution real-world genomic data with clinical outcomes can potentially facilitate risk-stratified precision therapies. See related article by Negrao et al., p. 1556 (2).
Collapse
Affiliation(s)
- Sarina Z W Heng
- Cancer and Therapeutics Research Laboratory, National Cancer Center Singapore, Singapore
- Division of Medical Oncology, National Cancer Center Singapore, Singapore
| | - Regina Hoo
- Cancer and Therapeutics Research Laboratory, National Cancer Center Singapore, Singapore
- Division of Medical Oncology, National Cancer Center Singapore, Singapore
| | - Daniel S W Tan
- Cancer and Therapeutics Research Laboratory, National Cancer Center Singapore, Singapore
- Division of Medical Oncology, National Cancer Center Singapore, Singapore
- Genome Institute of Singapore, Singapore
- Duke-NUS Medical School Singapore, Singapore
| |
Collapse
|
176
|
Lee M. Deep learning in CRISPR-Cas systems: a review of recent studies. Front Bioeng Biotechnol 2023; 11:1226182. [PMID: 37469443 PMCID: PMC10352112 DOI: 10.3389/fbioe.2023.1226182] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 06/22/2023] [Indexed: 07/21/2023] Open
Abstract
In genetic engineering, the revolutionary CRISPR-Cas system has proven to be a vital tool for precise genome editing. Simultaneously, the emergence and rapid evolution of deep learning methodologies has provided an impetus to the scientific exploration of genomic data. These concurrent advancements mandate regular investigation of the state-of-the-art, particularly given the pace of recent developments. This review focuses on the significant progress achieved during 2019-2023 in the utilization of deep learning for predicting guide RNA (gRNA) activity in the CRISPR-Cas system, a key element determining the effectiveness and specificity of genome editing procedures. In this paper, an analytical overview of contemporary research is provided, with emphasis placed on the amalgamation of artificial intelligence and genetic engineering. The importance of our review is underscored by the necessity to comprehend the rapidly evolving deep learning methodologies and their potential impact on the effectiveness of the CRISPR-Cas system. By analyzing recent literature, this review highlights the achievements and emerging trends in the integration of deep learning with the CRISPR-Cas systems, thus contributing to the future direction of this essential interdisciplinary research area.
Collapse
|
177
|
Raudenska M, Vicar T, Gumulec J, Masarik M. Johann Gregor Mendel: the victory of statistics over human imagination. Eur J Hum Genet 2023; 31:744-748. [PMID: 36755104 PMCID: PMC9909140 DOI: 10.1038/s41431-023-01303-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/11/2023] [Accepted: 01/24/2023] [Indexed: 02/10/2023] Open
Abstract
In 2022, we celebrated 200 years since the birth of Johann Gregor Mendel. Although his contributions to science went unrecognized during his lifetime, Mendel not only described the principles of monogenic inheritance but also pioneered the modern way of doing science based on precise experimental data acquisition and evaluation. Novel statistical and algorithmic approaches are now at the center of scientific work, showing that work that is considered marginal in one era can become a mainstream research approach in the next era. The onset of data-driven science caused a shift from hypothesis-testing to hypothesis-generating approaches in science. Mendel is remembered here as a promoter of this approach, and the benefits of big data and statistical approaches are discussed.
Collapse
Affiliation(s)
- Martina Raudenska
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
| | - Tomas Vicar
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 3058/10, Brno, Czech Republic
| | - Jaromir Gumulec
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
| | - Michal Masarik
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic.
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic.
- BIOCEV, First Faculty of Medicine, Charles University, Prumyslova 595, CZ-252 50, Vestec, Czech Republic.
| |
Collapse
|
178
|
Chowdhary K, Benoist C. A variegated model of transcription factor function in the immune system. Trends Immunol 2023; 44:530-541. [PMID: 37258360 PMCID: PMC10332489 DOI: 10.1016/j.it.2023.05.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/26/2023] [Accepted: 05/01/2023] [Indexed: 06/02/2023]
Abstract
Specific combinations of transcription factors (TFs) control the gene expression programs that underlie specialized immune responses. Previous models of TF function in immunocytes had restricted each TF to a single functional categorization [e.g., lineage-defining (LDTFs) vs. signal-dependent TFs (SDTFs)] within one cell type. Synthesizing recent results, we instead propose a variegated model of immunological TF function, whereby many TFs have flexible and different roles across distinct cell states, contributing to cell phenotypic diversity. We discuss evidence in support of this variegated model, describe contextual inputs that enable TF diversification, and look to the future to imagine warranted experimental and computational tools to build quantitative and predictive models of immunocyte gene regulatory networks.
Collapse
|
179
|
Meriranta L, Pitkänen E, Leppä S. Blood has never been thicker: Cell-free DNA fragmentomics in the liquid biopsy toolbox of B-cell lymphomas. Semin Hematol 2023; 60:132-141. [PMID: 37455222 DOI: 10.1053/j.seminhematol.2023.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 05/30/2023] [Accepted: 06/24/2023] [Indexed: 07/18/2023]
Abstract
Liquid biopsies utilizing plasma circulating tumor DNA (ctDNA) are anticipated to revolutionize decision-making in cancer care. In the field of lymphomas, ctDNA-based blood tests represent the forefront of clinically applicable tools to harness decades of genomic research for disease profiling, quantification, and detection. More recently, the discovery of nonrandom fragmentation patterns in cell-free DNA (cfDNA) has opened another avenue of liquid biopsy research beyond mutational interrogation of ctDNA. Through examination of structural features, nucleotide content, and genomic distribution of massive numbers of plasma cfDNA molecules, the study of fragmentomics aims at identifying new tools that augment existing ctDNA-based analyses and discover new ways to profile cancer from blood tests. Indeed, the characterization of aberrant lymphoma ctDNA fragment patterns and harnessing them with powerful machine-learning techniques are expected to unleash the potential of nonmutant molecules for liquid biopsy purposes. In this article, we review cfDNA fragmentomics as an emerging approach in the ctDNA research of B-cell lymphomas. We summarize the biology behind the formation of cfDNA fragment patterns and discuss the preanalytical and technical limitations faced with current methodologies. Then we go through the advances in the field of lymphomas and envision what other noninvasive tools based on fragment characteristics could be explored. Last, we place fragmentomics as one of the facets of ctDNA analyses in emerging multiview and multiomics liquid biopsies. We pay attention to the unknowns in the field of cfDNA fragmentation biology that warrant further mechanistic investigation to provide rational background for the development of these precision oncology tools and understanding of their limitations.
Collapse
Affiliation(s)
- Leo Meriranta
- Applied Tumor Genomics, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Oncology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland; iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland.
| | - Esa Pitkänen
- Applied Tumor Genomics, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland; iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland; Institute for Molecular Medicine Finland (FIMM), HILIFE, Helsinki, Finland
| | - Sirpa Leppä
- Applied Tumor Genomics, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Oncology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland; iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland.
| |
Collapse
|
180
|
Hill C, Hudaiberdiev S, Ovcharenko I. ChromDL: a next-generation regulatory DNA classifier. Bioinformatics 2023; 39:i377-i385. [PMID: 37387183 PMCID: PMC10311331 DOI: 10.1093/bioinformatics/btad217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine-learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA. RESULTS Using a comparative analysis of the performance of thousands of Deep Learning architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units, convolutional neural networks, and bidirectional long short-term memory units, which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site, histone modification, and DNase-I hyper-sensitive site detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor binding as compared to previously developed methods and has the potential to help delineate transcription factor binding motif specificities. AVAILABILITY AND IMPLEMENTATION The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.
Collapse
Affiliation(s)
- Christopher Hill
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Sanjarbek Hudaiberdiev
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
| | - Ivan Ovcharenko
- Computational Biology Branch, Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, United States
| |
Collapse
|
181
|
Yang M, Ma J. UNADON: transformer-based model to predict genome-wide chromosome spatial position. Bioinformatics 2023; 39:i553-i562. [PMID: 37387176 PMCID: PMC10311299 DOI: 10.1093/bioinformatics/btad246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The spatial positioning of chromosomes relative to functional nuclear bodies is intertwined with genome functions such as transcription. However, the sequence patterns and epigenomic features that collectively influence chromatin spatial positioning in a genome-wide manner are not well understood. RESULTS Here, we develop a new transformer-based deep learning model called UNADON, which predicts the genome-wide cytological distance to a specific type of nuclear body, as measured by TSA-seq, using both sequence features and epigenomic signals. Evaluations of UNADON in four cell lines (K562, H1, HFFc6, HCT116) show high accuracy in predicting chromatin spatial positioning to nuclear bodies when trained on a single cell line. UNADON also performed well in an unseen cell type. Importantly, we reveal potential sequence and epigenomic factors that affect large-scale chromatin compartmentalization in nuclear bodies. Together, UNADON provides new insights into the principles between sequence features and large-scale chromatin spatial localization, which has important implications for understanding nuclear structure and function. AVAILABILITY AND IMPLEMENTATION The source code of UNADON can be found at https://github.com/ma-compbio/UNADON.
Collapse
Affiliation(s)
- Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213, USA
| |
Collapse
|
182
|
Novakovsky G, Fornes O, Saraswat M, Mostafavi S, Wasserman WW. ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biol 2023; 24:154. [PMID: 37370113 DOI: 10.1186/s13059-023-02985-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Deep learning models such as convolutional neural networks (CNNs) excel in genomic tasks but lack interpretability. We introduce ExplaiNN, which combines the expressiveness of CNNs with the interpretability of linear models. ExplaiNN can predict TF binding, chromatin accessibility, and de novo motifs, achieving performance comparable to state-of-the-art methods. Its predictions are transparent, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. ExplaiNN can serve as a plug-and-play platform for pretrained models and annotated position weight matrices. ExplaiNN aims to accelerate the adoption of deep learning in genomic sequence analysis by domain experts.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Manu Saraswat
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington (UW), Seattle, USA
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
183
|
Lee M. Deep Learning Techniques with Genomic Data in Cancer Prognosis: A Comprehensive Review of the 2021-2023 Literature. BIOLOGY 2023; 12:893. [PMID: 37508326 PMCID: PMC10376033 DOI: 10.3390/biology12070893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023]
Abstract
Deep learning has brought about a significant transformation in machine learning, leading to an array of novel methodologies and consequently broadening its influence. The application of deep learning in various sectors, especially biomedical data analysis, has initiated a period filled with noteworthy scientific developments. This trend has majorly influenced cancer prognosis, where the interpretation of genomic data for survival analysis has become a central research focus. The capacity of deep learning to decode intricate patterns embedded within high-dimensional genomic data has provoked a paradigm shift in our understanding of cancer survival. Given the swift progression in this field, there is an urgent need for a comprehensive review that focuses on the most influential studies from 2021 to 2023. This review, through its careful selection and thorough exploration of dominant trends and methodologies, strives to fulfill this need. The paper aims to enhance our existing understanding of applications of deep learning in cancer survival analysis, while also highlighting promising directions for future research. This paper undertakes aims to enrich our existing grasp of the application of deep learning in cancer survival analysis, while concurrently shedding light on promising directions for future research in this vibrant and rapidly proliferating field.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
184
|
Long E, Wan P, Chen Q, Lu Z, Choi J. From function to translation: Decoding genetic susceptibility to human diseases via artificial intelligence. CELL GENOMICS 2023; 3:100320. [PMID: 37388909 PMCID: PMC10300605 DOI: 10.1016/j.xgen.2023.100320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
While genome-wide association studies (GWAS) have discovered thousands of disease-associated loci, molecular mechanisms for a considerable fraction of the loci remain to be explored. The logical next steps for post-GWAS are interpreting these genetic associations to understand disease etiology (GWAS functional studies) and translating this knowledge into clinical benefits for the patients (GWAS translational studies). Although various datasets and approaches using functional genomics have been developed to facilitate these studies, significant challenges remain due to data heterogeneity, multiplicity, and high dimensionality. To address these challenges, artificial intelligence (AI) technology has demonstrated considerable promise in decoding complex functional datasets and providing novel biological insights into GWAS findings. This perspective first describes the landmark progress driven by AI in interpreting and translating GWAS findings and then outlines specific challenges followed by actionable recommendations related to data availability, model optimization, and interpretation, as well as ethical concerns.
Collapse
Affiliation(s)
- Erping Long
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peixing Wan
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
185
|
Ghoussaini M, Nelson MR, Dunham I. Future prospects for human genetics and genomics in drug discovery. Curr Opin Struct Biol 2023; 80:102568. [PMID: 36963162 PMCID: PMC7614359 DOI: 10.1016/j.sbi.2023.102568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 03/26/2023]
Abstract
Evidence from human genetics supporting the therapeutic hypothesis increases the likelihood that a drug will succeed in clinical trials. Rare and common disease genetics yield a wide array of alleles with a range of effect sizes that can proxy for the effect of a drug in disease. Recent advances in large scale population collections and whole genome sequencing approaches have provided a rich resource of human genetic evidence to support drug target selection. As the range of phenotypes profiled increases and ever more alleles are discovered across world-wide populations, these approaches will increasingly influence multiple stages across the lifespan of a drug discovery programme.
Collapse
Affiliation(s)
- Maya Ghoussaini
- Wellcome Sanger Institute, Wellcome Genome Campus, United Kingdom; Open Targets, Wellcome Genome Campus, United Kingdom. https://twitter.com/MayaGhoussaini
| | | | - Ian Dunham
- Wellcome Sanger Institute, Wellcome Genome Campus, United Kingdom; Open Targets, Wellcome Genome Campus, United Kingdom; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, United Kingdom.
| |
Collapse
|
186
|
Salvatore M, Horlacher M, Marsico A, Winther O, Andersson R. Transfer learning identifies sequence determinants of cell-type specific regulatory element accessibility. NAR Genom Bioinform 2023; 5:lqad026. [PMID: 37007588 PMCID: PMC10052367 DOI: 10.1093/nargab/lqad026] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/01/2023] [Accepted: 03/07/2023] [Indexed: 04/03/2023] Open
Abstract
Dysfunction of regulatory elements through genetic variants is a central mechanism in the pathogenesis of disease. To better understand disease etiology, there is consequently a need to understand how DNA encodes regulatory activity. Deep learning methods show great promise for modeling of biomolecular data from DNA sequence but are limited to large input data for training. Here, we develop ChromTransfer, a transfer learning method that uses a pre-trained, cell-type agnostic model of open chromatin regions as a basis for fine-tuning on regulatory sequences. We demonstrate superior performances with ChromTransfer for learning cell-type specific chromatin accessibility from sequence compared to models not informed by a pre-trained model. Importantly, ChromTransfer enables fine-tuning on small input data with minimal decrease in accuracy. We show that ChromTransfer uses sequence features matching binding site sequences of key transcription factors for prediction. Together, these results demonstrate ChromTransfer as a promising tool for learning the regulatory code.
Collapse
Affiliation(s)
- Marco Salvatore
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200, Copenhagen, Denmark
- Abzu ApS, 2150, Copenhagen, Denmark
| | - Marc Horlacher
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Computer Science, Technical University Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Munich, Germany
| | - Annalisa Marsico
- Computational Health Center, Helmholtz Center Munich, Munich, Germany
| | - Ole Winther
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200, Copenhagen, Denmark
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
- Department of Genomic medicine, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Robin Andersson
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200, Copenhagen, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
187
|
Alakuş TB. A Novel Repetition Frequency-Based DNA Encoding Scheme to Predict Human and Mouse DNA Enhancers with Deep Learning. Biomimetics (Basel) 2023; 8:218. [PMID: 37366813 DOI: 10.3390/biomimetics8020218] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent studies have shown that DNA enhancers have an important role in the regulation of gene expression. They are responsible for different important biological elements and processes such as development, homeostasis, and embryogenesis. However, experimental prediction of these DNA enhancers is time-consuming and costly as it requires laboratory work. Therefore, researchers started to look for alternative ways and started to apply computation-based deep learning algorithms to this field. Yet, the inconsistency and unsuccessful prediction performance of computational-based approaches among various cell lines led to the investigation of these approaches as well. Therefore, in this study, a novel DNA encoding scheme was proposed, and solutions were sought to the problems mentioned and DNA enhancers were predicted with BiLSTM. The study consisted of four different stages for two scenarios. In the first stage, DNA enhancer data were obtained. In the second stage, DNA sequences were converted to numerical representations by both the proposed encoding scheme and various DNA encoding schemes including EIIP, integer number, and atomic number. In the third stage, the BiLSTM model was designed, and the data were classified. In the final stage, the performance of DNA encoding schemes was determined by accuracy, precision, recall, F1-score, CSI, MCC, G-mean, Kappa coefficient, and AUC scores. In the first scenario, it was determined whether the DNA enhancers belonged to humans or mice. As a result of the prediction process, the highest performance was achieved with the proposed DNA encoding scheme, and an accuracy of 92.16% and an AUC score of 0.85 were calculated, respectively. The closest accuracy score to the proposed scheme was obtained with the EIIP DNA encoding scheme and the result was observed as 89.14%. The AUC score of this scheme was measured as 0.87. Among the remaining DNA encoding schemes, the atomic number showed an accuracy score of 86.61%, while this rate decreased to 76.96% with the integer scheme. The AUC values of these schemes were 0.84 and 0.82, respectively. In the second scenario, it was determined whether there was a DNA enhancer and, if so, it was decided to which species this enhancer belonged. In this scenario, the highest accuracy score was obtained with the proposed DNA encoding scheme and the result was 84.59%. Moreover, the AUC score of the proposed scheme was determined as 0.92. EIIP and integer DNA encoding schemes showed accuracy scores of 77.80% and 73.68%, respectively, while their AUC scores were close to 0.90. The most ineffective prediction was performed with the atomic number and the accuracy score of this scheme was calculated as 68.27%. Finally, the AUC score of this scheme was 0.81. At the end of the study, it was observed that the proposed DNA encoding scheme was successful and effective in predicting DNA enhancers.
Collapse
Affiliation(s)
- Talha Burak Alakuş
- Department of Software Engineering, Faculty of Engineering, Kırklareli University, 39100 Kırklareli, Turkey
| |
Collapse
|
188
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
189
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
190
|
Zhu Y, Li J, Kim J, Li S, Zhao Y, Bahari J, Eliahoo P, Li G, Kawakita S, Haghniaz R, Gao X, Falcone N, Ermis M, Kang H, Liu H, Kim H, Tabish T, Yu H, Li B, Akbari M, Emaminejad S, Khademhosseini A. Skin-interfaced electronics: A promising and intelligent paradigm for personalized healthcare. Biomaterials 2023; 296:122075. [PMID: 36931103 PMCID: PMC10085866 DOI: 10.1016/j.biomaterials.2023.122075] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/23/2023] [Accepted: 03/02/2023] [Indexed: 03/09/2023]
Abstract
Skin-interfaced electronics (skintronics) have received considerable attention due to their thinness, skin-like mechanical softness, excellent conformability, and multifunctional integration. Current advancements in skintronics have enabled health monitoring and digital medicine. Particularly, skintronics offer a personalized platform for early-stage disease diagnosis and treatment. In this comprehensive review, we discuss (1) the state-of-the-art skintronic devices, (2) material selections and platform considerations of future skintronics toward intelligent healthcare, (3) device fabrication and system integrations of skintronics, (4) an overview of the skintronic platform for personalized healthcare applications, including biosensing as well as wound healing, sleep monitoring, the assessment of SARS-CoV-2, and the augmented reality-/virtual reality-enhanced human-machine interfaces, and (5) current challenges and future opportunities of skintronics and their potentials in clinical translation and commercialization. The field of skintronics will not only minimize physical and physiological mismatches with the skin but also shift the paradigm in intelligent and personalized healthcare and offer unprecedented promise to revolutionize conventional medical practices.
Collapse
Affiliation(s)
- Yangzhi Zhu
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States.
| | - Jinghang Li
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Jinjoo Kim
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Shaopei Li
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Yichao Zhao
- Interconnected and Integrated Bioelectronics Lab, Department of Electrical and Computer Engineering, and Materials Science and Engineering, University of California, Los Angeles, CA, 90095, United States
| | - Jamal Bahari
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Payam Eliahoo
- Biomedical Engineering Department, University of Southern California, Los Angeles, CA, 90007, United States
| | - Guanghui Li
- The Centre of Nanoscale Science and Technology and Key Laboratory of Functional Polymer Materials, Institute of Polymer Chemistry, College of Chemistry, Nankai University, Tianjin, 300071, China; Renewable Energy Conversion and Storage Center (RECAST), Nankai University, Tianjin, 300071, China
| | - Satoru Kawakita
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Reihaneh Haghniaz
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Xiaoxiang Gao
- Department of Nanoengineering, University of California, San Diego, La Jolla, CA, 92093, United States
| | - Natashya Falcone
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Menekse Ermis
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States
| | - Heemin Kang
- Department of Materials Science and Engineering, Korea University, Seoul, 02841, Republic of Korea
| | - Hao Liu
- Bioinspired Engineering and Biomechanics Center (BEBC), Xi'an Jiaotong University, Xi'an, 710049, PR China
| | - HanJun Kim
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States; College of Pharmacy, Korea University, Sejong, 30019, Republic of Korea
| | - Tanveer Tabish
- Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 7BN, United Kingdom
| | - Haidong Yu
- Frontiers Science Center for Flexible Electronics, Xi'an Institute of Flexible Electronics (IFE) and Xi'an Institute of Biomedical Materials & Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China
| | - Bingbing Li
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States; Department of Manufacturing Systems Engineering and Management, California State University, Northridge, CA, 91330, United States
| | - Mohsen Akbari
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States; Laboratory for Innovation in Microengineering (LiME), Department of Mechanical Engineering, Center for Biomedical Research, University of Victoria, Victoria, BC V8P 2C5, Canada
| | - Sam Emaminejad
- Interconnected and Integrated Bioelectronics Lab, Department of Electrical and Computer Engineering, and Materials Science and Engineering, University of California, Los Angeles, CA, 90095, United States
| | - Ali Khademhosseini
- Terasaki Institute for Biomedical Innovation, Los Angeles, CA, 90064, United States.
| |
Collapse
|
191
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
192
|
Li B, Altelaar M, van Breukelen B. Identification of Protein Complexes by Integrating Protein Abundance and Interaction Features Using a Deep Learning Strategy. Int J Mol Sci 2023; 24:ijms24097884. [PMID: 37175590 PMCID: PMC10178578 DOI: 10.3390/ijms24097884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/23/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
Many essential cellular functions are carried out by multi-protein complexes that can be characterized by their protein-protein interactions. The interactions between protein subunits are critically dependent on the strengths of their interactions and their cellular abundances, both of which span orders of magnitude. Despite many efforts devoted to the global discovery of protein complexes by integrating large-scale protein abundance and interaction features, there is still room for improvement. Here, we integrated >7000 quantitative proteomic samples with three published affinity purification/co-fractionation mass spectrometry datasets into a deep learning framework to predict protein-protein interactions (PPIs), followed by the identification of protein complexes using a two-stage clustering strategy. Our deep-learning-technique-based classifier significantly outperformed recently published machine learning prediction models and in the process captured 5010 complexes containing over 9000 unique proteins. The vast majority of proteins in our predicted complexes exhibited low or no tissue specificity, which is an indication that the observed complexes tend to be ubiquitously expressed throughout all cell types and tissues. Interestingly, our combined approach increased the model sensitivity for low abundant proteins, which amongst other things allowed us to detect the interaction of MCM10, which connects to the replicative helicase complex via the MCM6 protein. The integration of protein abundances and their interaction features using a deep learning approach provided a comprehensive map of protein-protein interactions and a unique perspective on possible novel protein complexes.
Collapse
Affiliation(s)
- Bohui Li
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| | - Maarten Altelaar
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
- Mass Spectrometry and Proteomics Facility, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Bas van Breukelen
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| |
Collapse
|
193
|
Nikolados EM, Oyarzún DA. Deep learning for optimization of protein expression. Curr Opin Biotechnol 2023; 81:102941. [PMID: 37087839 DOI: 10.1016/j.copbio.2023.102941] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 02/02/2023] [Accepted: 03/17/2023] [Indexed: 04/25/2023]
Abstract
Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.
Collapse
Affiliation(s)
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK; The Alan Turing Institute, London NW1 2DB, UK.
| |
Collapse
|
194
|
Joiret M, Leclercq M, Lambrechts G, Rapino F, Close P, Louppe G, Geris L. Cracking the genetic code with neural networks. Front Artif Intell 2023; 6:1128153. [PMID: 37091301 PMCID: PMC10117997 DOI: 10.3389/frai.2023.1128153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/21/2023] [Indexed: 04/09/2023] Open
Abstract
The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.
Collapse
Affiliation(s)
- Marc Joiret
- Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium
- *Correspondence: Marc Joiret
| | - Marine Leclercq
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Gaspard Lambrechts
- Department of Electrical Engineering and Computer Science, Artificial Intelligence and Deep Learning, Montefiore Institute, Liège University, Liège, Belgium
| | - Francesca Rapino
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Pierre Close
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Gilles Louppe
- Department of Electrical Engineering and Computer Science, Artificial Intelligence and Deep Learning, Montefiore Institute, Liège University, Liège, Belgium
| | - Liesbet Geris
- Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium
- Skeletal Biology and Engineering Research Center, KU Leuven, Leuven, Belgium
- Biomechanics Section, KU Leuven, Heverlee, Belgium
| |
Collapse
|
195
|
Liu Z, Dai W, Wang S, Yao Y, Zhang H. Deep learning identified genetic variants for COVID-19-related mortality among 28,097 affected cases in UK Biobank. Genet Epidemiol 2023; 47:215-230. [PMID: 36691909 PMCID: PMC10006374 DOI: 10.1002/gepi.22515] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 10/19/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023]
Abstract
Analysis of host genetic components provides insights into the susceptibility and response to viral infection such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19). To reveal genetic determinants of susceptibility to COVID-19 related mortality, we train a deep learning model to identify groups of genetic variants and their interactions that contribute to the COVID-19 related mortality risk using the UK Biobank data (28,097 affected cases and 1656 deaths). We refer to such groups of variants as super variants. We identify 15 super variants with various levels of significance as susceptibility loci for COVID-19 mortality. Specifically, we identify a super variant (odds ratio [OR] = 1.594, p = 5.47 × 10-9 ) on Chromosome 7 that consists of the minor allele of rs76398985, rs6943608, rs2052130, 7:150989011_CT_C, rs118033050, and rs12540488. We also discover a super variant (OR = 1.353, p = 2.87 × 10-8 ) on Chromosome 5 that contains rs12517344, rs72733036, rs190052994, rs34723029, rs72734818, 5:9305797_GTA_G, and rs180899355.
Collapse
Affiliation(s)
- Zihuan Liu
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Wei Dai
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Shiying Wang
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Yisha Yao
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Heping Zhang
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| |
Collapse
|
196
|
Ding K, Dixit G, Parker BJ, Wen J. CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets. Front Big Data 2023; 6:1113402. [PMID: 36999047 PMCID: PMC10043243 DOI: 10.3389/fdata.2023.1113402] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/23/2023] [Indexed: 03/17/2023] Open
Abstract
Recent large datasets measuring the gene expression of millions of possible gene promoter sequences provide a resource to design and train optimized deep neural network architectures to predict expression from sequences. High predictive performance due to the modeling of dependencies within and between regulatory sequences is an enabler for biological discoveries in gene regulation through model interpretation techniques. To understand the regulatory code that delineates gene expression, we have designed a novel deep-learning model (CRMnet) to predict gene expression in Saccharomyces cerevisiae. Our model outperforms the current benchmark models and achieves a Pearson correlation coefficient of 0.971 and a mean squared error of 3.200. Interpretation of informative genomic regions determined from model saliency maps, and overlapping the saliency maps with known yeast motifs, supports that our model can successfully locate the binding sites of transcription factors that actively modulate gene expression. We compare our model's training times on a large compute cluster with GPUs and Google TPUs to indicate practical training times on similar datasets.
Collapse
Affiliation(s)
- Ke Ding
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Gunjan Dixit
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Brian J. Parker
- School of Computing and Biological Data Science Institute, Australian National University, Canberra, ACT, Australia
- *Correspondence: Brian J. Parker
| | - Jiayu Wen
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- Jiayu Wen
| |
Collapse
|
197
|
de Souza LC, Azevedo KS, de Souza JG, Barbosa RDM, Fernandes MAC. New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning. BMC Bioinformatics 2023; 24:92. [PMID: 36906520 PMCID: PMC10007673 DOI: 10.1186/s12859-023-05188-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/15/2023] [Indexed: 03/13/2023] Open
Abstract
BACKGROUND In December 2019, the first case of COVID-19 was described in Wuhan, China, and by July 2022, there were already 540 million confirmed cases. Due to the rapid spread of the virus, the scientific community has made efforts to develop techniques for the viral classification of SARS-CoV-2. RESULTS In this context, we developed a new proposal for gene sequence representation with Genomic Signal Processing techniques for the work presented in this paper. First, we applied the mapping approach to samples of six viral species of the Coronaviridae family, which belongs SARS-CoV-2 Virus. We then used the sequence downsized obtained by the method proposed in a deep learning architecture for viral classification, achieving an accuracy of 98.35%, 99.08%, and 99.69% for the 64, 128, and 256 sizes of the viral signatures, respectively, and obtaining 99.95% precision for the vectors with size 256. CONCLUSIONS The classification results obtained, in comparison to the results produced using other state-of-the-art representation techniques, demonstrate that the proposed mapping can provide a satisfactory performance result with low computational memory and processing time costs.
Collapse
Affiliation(s)
- Luísa C. de Souza
- Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
| | - Karolayne S. Azevedo
- Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
| | - Jackson G. de Souza
- Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
| | - Raquel de M. Barbosa
- Department of Pharmacy and Pharmaceutical Technology, University of Granada, Granada, Spain
| | - Marcelo A. C. Fernandes
- Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
- Department of Computer Engineering and Automation, Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
- Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal, RN 59078-970 Brazil
| |
Collapse
|
198
|
Liu W, Zhang L, Bao L, Shen G, Feng J. Accurate Classification and Prediction of Acute Myocardial Infarction through an ARMD Procedure. J Proteome Res 2023; 22:758-767. [PMID: 36710647 DOI: 10.1021/acs.jproteome.2c00488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The risk stratification of acute myocardial infarction (AMI) patients is of prime importance for clinical management and prognosis assessment. Thus, we propose an ensemble machine learning analysis procedure named ADASYN-RFECV-MDA-DNN (ARMD) to address sample-unbalanced problems and enable stratification and prediction of AMI outcomes. The ARMD analysis procedure was applied to the NMR data of sera from 534 AMI-related subjects in four categories with an extremely imbalanced sample proportion. Firstly, the adaptive synthetic sampling (ADASYN) algorithm was used to address the issue of the original sample imbalance. Secondly, the recursive feature elimination with cross-validation (RFECV) processing and random forest mean decrease accuracy (RF-MDA) algorithm was performed to identify the differential metabolites corresponding to each AMI outcome. Finally, the deep neural network (DNN) was employed to classify and predict AMI events, and its performance was evaluated by comparing the four traditional machine learning methods. Compared with the other four machine learning models, DNN presented consistent superiority in almost all of the model parameters including precision, f1-score, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and classification accuracy, highlighting the potential of deep learning in classification and stratification of clinical diseases. The ARMD analysis procedure was a practical analysis tool for supervised classification and regression modeling of clinical diseases.
Collapse
Affiliation(s)
- Wuping Liu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Lirong Zhang
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Lijun Bao
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Guiping Shen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Jianghua Feng
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| |
Collapse
|
199
|
Kaur A, Chauhan APS, Aggarwal AK. Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1327-1336. [PMID: 35417351 DOI: 10.1109/tcbb.2022.3167090] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancer, a distal cis-regulatory element controls gene expression. Experimental prediction of enhancer elements is time-consuming and expensive. Consequently, various inexpensive deep learning-based fast methods have been developed for predicting the enhancers and determining their strength. In this paper, we have proposed a two-stage deep learning-based framework leveraging DNA structural features, natural language processing, convolutional neural network, and long short-term memory to predict the enhancer elements accurately in the genomics data. In the first stage, we extracted the features from DNA sequence data by using three feature representation techniques viz., k-mer based feature extraction along with word2vector based interpretation of underlined patterns, one-hot encoding, and the DNAshape technique. In the second stage, strength of enhancers is predicted from the extracted features using a hybrid deep learning model. The method is capable of adapting itself to varying sizes of datasets. Also, as proposed model can capture long-range sequencing patterns, the robustness of the method remains unaffected against minor variations in the genomics sequence. The method outperforms the other state-of-the-art methods at both stages in terms of performance metrics of prediction accuracy, specificity, Mathew's correlation coefficient, and area under the ROC curve. In summary, the proposed method is a reliable method for enhancer prediction.
Collapse
|
200
|
Tosta S, Moreno K, Schuab G, Fonseca V, Segovia FMC, Kashima S, Elias MC, Sampaio SC, Ciccozzi M, Alcantara LCJ, Slavov SN, Lourenço J, Cella E, Giovanetti M. Global SARS-CoV-2 genomic surveillance: What we have learned (so far). INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 108:105405. [PMID: 36681102 PMCID: PMC9847326 DOI: 10.1016/j.meegid.2023.105405] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 01/20/2023]
Abstract
The COVID-19 pandemic has brought significant challenges for genomic surveillance strategies in public health systems worldwide. During the past thirty-four months, many countries faced several epidemic waves of SARS-CoV-2 infections, driven mainly by the emergence and spread of novel variants. In that line, genomic surveillance has been a crucial toolkit to study the real-time SARS-CoV-2 evolution, for the assessment and optimization of novel diagnostic assays, and to improve the efficacy of existing vaccines. During the pandemic, the identification of emerging lineages carrying lineage-specific mutations (particularly those in the Receptor Binding domain) showed how these mutations might significantly impact viral transmissibility, protection from reinfection and vaccination. So far, an unprecedented number of SARS-CoV-2 viral genomes has been released in public databases (i.e., GISAID, and NCBI), achieving 14 million genome sequences available as of early-November 2022. In the present review, we summarise the global landscape of SARS-CoV-2 during the first thirty-four months of viral circulation and evolution. It demonstrates the urgency and importance of sustained investment in genomic surveillance strategies to timely identify the emergence of any potential viral pathogen or associated variants, which in turn is key to epidemic and pandemic preparedness.
Collapse
Affiliation(s)
- Stephane Tosta
- Interunit Postgraduate Program in Bioinformatics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Keldenn Moreno
- Interunit Postgraduate Program in Bioinformatics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Gabriel Schuab
- Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil; Laboratório de Flavivirus, Instituto Oswaldo Cruz, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Vagner Fonseca
- Organização Pan-Americana da Saúde/Organização Mundial da Saúde, Brasília, Distrito Federal, Brazil.
| | | | - Simone Kashima
- Blood Center of Ribeirão Preto, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo,Brazil
| | | | | | - Massimo Ciccozzi
- Unit of Medical Statistics and Molecular Epidemiology, University Campus Bio-Medico of Rome, Italy
| | - Luiz Carlos Junior Alcantara
- Interunit Postgraduate Program in Bioinformatics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Laboratório de Flavivirus, Instituto Oswaldo Cruz, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Svetoslav Nanev Slavov
- Blood Center of Ribeirão Preto, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo,Brazil; Butantan Institute, São Paulo, Brazil
| | - José Lourenço
- BioISI (Biosystems and Integrative Sciences Institute), Faculdade de Ciências da Universidade de Lisboa, Lisboa,Portugal
| | - Eleonora Cella
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32827, USA.
| | - Marta Giovanetti
- Interunit Postgraduate Program in Bioinformatics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Laboratório de Flavivirus, Instituto Oswaldo Cruz, Rio de Janeiro, Rio de Janeiro, Brazil; Department of Science and Technology for Humans and the Environment, University of Campus Bio-Medico di Roma, Rome, Italy.
| |
Collapse
|