1
|
Tihagam RD, Lou S, Zhao Y, Liu KSY, Singh AT, Koo BI, Przanowski P, Li J, Huang X, Li H, Tushir-Singh J, Fejerman L, Bhatnagar S. The TRIM37 variant rs57141087 contributes to triple-negative breast cancer outcomes in Black women. EMBO Rep 2025; 26:245-272. [PMID: 39614126 PMCID: PMC11723928 DOI: 10.1038/s44319-024-00331-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 11/07/2024] [Accepted: 11/12/2024] [Indexed: 12/01/2024] Open
Abstract
Triple-negative breast cancer (TNBC) disproportionately affects younger Black women, who show more aggressive phenotypes and poorer outcomes than women of other racial identities. While the impact of socioenvironmental inequities within and beyond health systems is well documented, the genetic influence in TNBC-associated racial disparities remains elusive. Here, we report that cancer-free breast tissue from Black women expresses TRIM37 at a significantly higher level relative to White women. A reporter-based screen for regulatory variants identifies a non-coding risk variant rs57141087 in the 5' gene upstream region of the TRIM37 locus with enhancer activity. Mechanistically, rs57141087 increases enhancer-promoter interactions through NRF1, resulting in stronger TRIM37 promoter activity. Phenotypically, high TRIM37 levels drive neoplastic transformations in immortalized breast epithelial cells. Finally, context-dependent TRIM37 expression reveals that early-stage TRIM37 levels affect the initiation and trajectory of breast cancer progression. Together, our results indicate a genotype-informed association of oncogenic TRIM37 with TNBC risk in Black women and implicate TRIM37 as a predictive biomarker to better identify patients at risk of aggressive TNBC.
Collapse
Affiliation(s)
- Rachisan Djiake Tihagam
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
| | - Song Lou
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
| | - Yuanji Zhao
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
| | - Kammi Song-Yan Liu
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
| | - Arjun Tushir Singh
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
- UC Davis Comprehensive Cancer Center, University of California Davis, Davis, CA, 95817, USA
| | - Bon Il Koo
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
| | - Piotr Przanowski
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, 22908, USA
| | - Jie Li
- UC Davis Bioinformatics Core, University of California at Davis, Davis, CA, 95817, USA
| | - Xiaosong Huang
- Department of Public Health Sciences, University of California Davis, Davis, CA, 95817, USA
| | - Hong Li
- Department of Public Health Sciences, University of California Davis, Davis, CA, 95817, USA
| | - Jogender Tushir-Singh
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA
- UC Davis Comprehensive Cancer Center, University of California Davis, Davis, CA, 95817, USA
| | - Laura Fejerman
- Department of Public Health Sciences, University of California Davis, Davis, CA, 95817, USA
| | - Sanchita Bhatnagar
- Department of Medical Microbiology and Immunology, University of California Davis School of Medicine, Davis, CA, 95616, USA.
- UC Davis Comprehensive Cancer Center, University of California Davis, Davis, CA, 95817, USA.
| |
Collapse
|
2
|
Hackenberg M, Harms P, Pfaffenlehner M, Pechmann A, Kirschner J, Schmidt T, Binder H. Deep dynamic modeling with just two time points: Can we still allow for individual trajectories? Biom J 2022; 64:1426-1445. [PMID: 35384018 DOI: 10.1002/bimj.202000366] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/20/2021] [Accepted: 02/10/2022] [Indexed: 12/14/2022]
Abstract
Longitudinal biomedical data are often characterized by a sparse time grid and individual-specific development patterns. Specifically, in epidemiological cohort studies and clinical registries we are facing the question of what can be learned from the data in an early phase of the study, when only a baseline characterization and one follow-up measurement are available. Inspired by recent advances that allow to combine deep learning with dynamic modeling, we investigate whether such approaches can be useful for uncovering complex structure, in particular for an extreme small data setting with only two observations time points for each individual. Irregular spacing in time could then be used to gain more information on individual dynamics by leveraging similarity of individuals. We provide a brief overview of how variational autoencoders (VAEs), as a deep learning approach, can be linked to ordinary differential equations (ODEs) for dynamic modeling, and then specifically investigate the feasibility of such an approach that infers individual-specific latent trajectories by including regularity assumptions and individuals' similarity. We also provide a description of this deep learning approach as a filtering task to give a statistical perspective. Using simulated data, we show to what extent the approach can recover individual trajectories from ODE systems with two and four unknown parameters and infer groups of individuals with similar trajectories, and where it breaks down. The results show that such dynamic deep learning approaches can be useful even in extreme small data settings, but need to be carefully adapted.
Collapse
Affiliation(s)
- Maren Hackenberg
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Philipp Harms
- Institute of Mathematics, Faculty of Mathematics and Physics, University of Freiburg, Freiburg, Germany
| | - Michelle Pfaffenlehner
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Astrid Pechmann
- Department of Neuropediatrics and Muscle Disorders, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Janbernd Kirschner
- Department of Neuropediatrics and Muscle Disorders, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
- Department of Neuropediatrics, University Hospital Bonn, Bonn, Germany
| | - Thorsten Schmidt
- Institute of Mathematics, Faculty of Mathematics and Physics, University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| |
Collapse
|
3
|
Hernández-Lorenzo L, Hoffmann M, Scheibling E, List M, Matías-Guiu JA, Ayala JL. On the limits of graph neural networks for the early diagnosis of Alzheimer's disease. Sci Rep 2022; 12:17632. [PMID: 36271229 PMCID: PMC9587223 DOI: 10.1038/s41598-022-21491-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/28/2022] [Indexed: 01/13/2023] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease whose molecular mechanisms are activated several years before cognitive symptoms appear. Genotype-based prediction of the phenotype is thus a key challenge for the early diagnosis of AD. Machine learning techniques that have been proposed to address this challenge do not consider known biological interactions between the genes used as input features, thus neglecting important information about the disease mechanisms at play. To mitigate this, we first extracted AD subnetworks from several protein-protein interaction (PPI) databases and labeled these with genotype information (number of missense variants) to make them patient-specific. Next, we trained Graph Neural Networks (GNNs) on the patient-specific networks for phenotype prediction. We tested different PPI databases and compared the performance of the GNN models to baseline models using classical machine learning techniques, as well as randomized networks and input datasets. The overall results showed that GNNs could not outperform a baseline predictor only using the APOE gene, suggesting that missense variants are not sufficient to explain disease risk beyond the APOE status. Nevertheless, our results show that GNNs outperformed other machine learning techniques and that protein-protein interactions lead to superior results compared to randomized networks. These findings highlight that gene interactions are a valuable source of information in predicting disease status.
Collapse
Affiliation(s)
- Laura Hernández-Lorenzo
- Department of Computer Architecture and Automation, Computer Science Faculty, Complutense University of Madrid, 28040, Madrid, Spain.
- Department of Neurology, Hospital Clínico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, 28040, Madrid, Spain.
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany.
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Advanced Study, Technical University of Munich, Lichtenbergstrasse 2 a, 85748, Garching, Germany
| | - Evelyn Scheibling
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Jordi A Matías-Guiu
- Department of Neurology, Hospital Clínico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, 28040, Madrid, Spain
| | - Jose L Ayala
- Department of Computer Architecture and Automation, Computer Science Faculty, Complutense University of Madrid, 28040, Madrid, Spain
| |
Collapse
|
4
|
Köber G, Pooseh S, Engen H, Chmitorz A, Kampa M, Schick A, Sebastian A, Tüscher O, Wessa M, Yuen KSL, Walter H, Kalisch R, Timmer J, Binder H. Individualizing deep dynamic models for psychological resilience data. Sci Rep 2022; 12:8061. [PMID: 35577829 PMCID: PMC9110739 DOI: 10.1038/s41598-022-11650-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Deep learning approaches can uncover complex patterns in data. In particular, variational autoencoders achieve this by a non-linear mapping of data into a low-dimensional latent space. Motivated by an application to psychological resilience in the Mainz Resilience Project, which features intermittent longitudinal measurements of stressors and mental health, we propose an approach for individualized, dynamic modeling in this latent space. Specifically, we utilize ordinary differential equations (ODEs) and develop a novel technique for obtaining person-specific ODE parameters even in settings with a rather small number of individuals and observations, incomplete data, and a differing number of observations per individual. This technique allows us to subsequently investigate individual reactions to stimuli, such as the mental health impact of stressors. A potentially large number of baseline characteristics can then be linked to this individual response by regularized regression, e.g., for identifying resilience factors. Thus, our new method provides a way of connecting different kinds of complex longitudinal and baseline measures via individualized, dynamic models. The promising results obtained in the exemplary resilience application indicate that our proposal for dynamic deep learning might also be more generally useful for other application domains.
Collapse
|
5
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
6
|
Zhao Z, Woloszynek S, Agbavor F, Mell JC, Sokhansanj BA, Rosen GL. Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput Biol 2021; 17:e1009345. [PMID: 34550967 PMCID: PMC8496832 DOI: 10.1371/journal.pcbi.1009345] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/07/2021] [Accepted: 08/12/2021] [Indexed: 01/04/2023] Open
Abstract
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
Collapse
Affiliation(s)
- Zhengqiao Zhao
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Stephen Woloszynek
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Felix Agbavor
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Joshua Chang Mell
- College of Medicine, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Bahrad A. Sokhansanj
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Gail L. Rosen
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
7
|
Soumare H, Rezgui S, Gmati N, Benkahla A. New neural network classification method for individuals ancestry prediction from SNPs data. BioData Min 2021; 14:30. [PMID: 34183066 PMCID: PMC8240223 DOI: 10.1186/s13040-021-00258-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/29/2021] [Indexed: 11/18/2022] Open
Abstract
Artificial Neural Network (ANN) algorithms have been widely used to analyse genomic data. Single Nucleotide Polymorphisms(SNPs) represent the genetic variations, the most common in the human genome, it has been shown that they are involved in many genetic diseases, and can be used to predict their development. Developing ANN to handle this type of data can be considered as a great success in the medical world. However, the high dimensionality of genomic data and the availability of a limited number of samples can make the learning task very complicated. In this work, we propose a New Neural Network classification method based on input perturbation. The idea is first to use SVD to reduce the dimensionality of the input data and to train a classification network, which prediction errors are then reduced by perturbing the SVD projection matrix. The proposed method has been evaluated on data from individuals with different ancestral origins, the experimental results have shown the effectiveness of the proposed method. Achieving up to 96.23% of classification accuracy, this approach surpasses previous Deep learning approaches evaluated on the same dataset.
Collapse
Affiliation(s)
- H. Soumare
- The Laboratory of Mathematical Modelling and Numeric in Engineering Sciences, National Engineering School of Tunis, Rue Béchir Salem Belkhiria Campus universitaire, B.P. 37, 1002 Tunis Belvédère, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of BioInformatics, bioMathematics, and bioStatistics, 13 place Pasteur, B.P. 74 1002 Tunis, Belvédère, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - S. Rezgui
- ADAGOS. Le Belvédère centre, 61 rue El Khartoum, El Menzah, Tunis, Tunisia
| | - N. Gmati
- College of sciences & Basic and Applied Scientific Research Center, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, Dammam, Kingdom of Saudi Arabia, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - A. Benkahla
- Laboratory of BioInformatics, bioMathematics, and bioStatistics, 13 place Pasteur, B.P. 74 1002 Tunis, Belvédère, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
8
|
Lenz S, Hess M, Binder H. Deep generative models in DataSHIELD. BMC Med Res Methodol 2021; 21:64. [PMID: 33812380 PMCID: PMC8019187 DOI: 10.1186/s12874-021-01237-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 02/23/2021] [Indexed: 03/10/2023] Open
Abstract
Background The best way to calculate statistics from medical data is to use the data of individual patients. In some settings, this data is difficult to obtain due to privacy restrictions. In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients. Methods The DataSHIELD software provides an infrastructure and a set of statistical methods for joint, privacy-preserving analyses of distributed data. The contained algorithms are reformulated to work with aggregated data from the participating sites instead of the individual data. If a desired algorithm is not implemented in DataSHIELD or cannot be reformulated in such a way, using artificial data is an alternative. Generating artificial data is possible using so-called generative models, which are able to capture the distribution of given data. Here, we employ deep Boltzmann machines (DBMs) as generative models. For the implementation, we use the package “BoltzmannMachines” from the Julia programming language and wrap it for use with DataSHIELD, which is based on R. Results We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data. Such data sets of artificial patients, which are not linked to real patients, can then be used for joint analyses. As an exemplary application, we conduct a distributed analysis with DBMs on a synthetic data set, which simulates genetic variant data. Patterns from the original data can be recovered in the artificial data using hierarchical clustering of the virtual patients, demonstrating the feasibility of the approach. Additionally, we compare DBMs, variational autoencoders, generative adversarial networks, and multivariate imputation as generative approaches by assessing the utility and disclosure of synthetic data generated from real genetic variant data in a distributed setting with data of a small sample size. Conclusions Our implementation adds to DataSHIELD the ability to generate artificial data that can be used for various analyses, e.g., for pattern recognition with deep learning. This also demonstrates more generally how DataSHIELD can be flexibly extended with advanced algorithms from languages other than R.
Collapse
Affiliation(s)
- Stefan Lenz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany.
| | - Moritz Hess
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| |
Collapse
|
9
|
Hess M, Hackenberg M, Binder H. Exploring generative deep learning for omics data using log-linear models. Bioinformatics 2020; 36:5045-5053. [PMID: 32647888 PMCID: PMC7755415 DOI: 10.1093/bioinformatics/btaa623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 06/28/2020] [Accepted: 07/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Following many successful applications to image data, deep learning is now also increasingly considered for omics data. In particular, generative deep learning not only provides competitive prediction performance, but also allows for uncovering structure by generating synthetic samples. However, exploration and visualization is not as straightforward as with image applications. RESULTS We demonstrate how log-linear models, fitted to the generated, synthetic data can be used to extract patterns from omics data, learned by deep generative techniques. Specifically, interactions between latent representations learned by the approaches and generated synthetic data are used to determine sets of joint patterns. Distances of patterns with respect to the distribution of latent representations are then visualized in low-dimensional coordinate systems, e.g. for monitoring training progress. This is illustrated with simulated data and subsequently with cortical single-cell gene expression data. Using different kinds of deep generative techniques, specifically variational autoencoders and deep Boltzmann machines, the proposed approach highlights how the techniques uncover underlying structure. It facilitates the real-world use of such generative deep learning techniques to gain biological insights from omics data. AVAILABILITY AND IMPLEMENTATION The code for the approach as well as an accompanying Jupyter notebook, which illustrates the application of our approach, is available via the GitHub repository: https://github.com/ssehztirom/Exploring-generative-deep-learning-for-omics-data-by-using-log-linear-models. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moritz Hess
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| | - Maren Hackenberg
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
10
|
Franco P, Würtemberger U, Dacca K, Hübschle I, Beck J, Schnell O, Mader I, Binder H, Urbach H, Heiland DH. SPectroscOpic prediction of bRain Tumours (SPORT): study protocol of a prospective imaging trial. BMC Med Imaging 2020; 20:123. [PMID: 33228567 PMCID: PMC7685595 DOI: 10.1186/s12880-020-00522-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 11/15/2020] [Indexed: 12/26/2022] Open
Abstract
Background The revised 2016 WHO-Classification of CNS-tumours now integrates molecular information of glial brain tumours for accurate diagnosis as well as for the development of targeted therapies. In this prospective study, our aim is to investigate the predictive value of MR-spectroscopy in order to establish a solid preoperative molecular stratification algorithm of these tumours. We will process a 1H MR-spectroscopy sequence within a radiomics analytics pipeline.
Methods Patients treated at our institution with WHO-Grade II, III and IV gliomas will receive preoperative anatomical (T2- and T1-weighted imaging with and without contrast enhancement) and proton MR spectroscopy (MRS) by using chemical shift imaging (MRS) (5 × 5 × 15 mm3 voxel size). Tumour regions will be segmented and co-registered to corresponding spectroscopic voxels.
Raw signals will be processed by a deep-learning approach for identifying patterns in metabolic data that provides information with respect to the histological diagnosis as well patient characteristics obtained and genomic data such as target sequencing and transcriptional data. Discussion By imaging the metabolic profile of a glioma using a customized chemical shift 1H MR spectroscopy sequence and by processing the metabolic profiles with a machine learning tool we intend to non-invasively uncover the genetic signature of gliomas. This work-up will support surgical and oncological decisions to improve personalized tumour treatment.
Trial registration This study was initially registered under another name and was later retrospectively registered under the current name at the German Clinical Trials Register (DRKS) under DRKS00019855.
Collapse
Affiliation(s)
- Pamela Franco
- Department of Neurosurgery, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany. .,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany.
| | - Urs Würtemberger
- Department of Neuroradiology, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Karam Dacca
- Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Irene Hübschle
- Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Jürgen Beck
- Department of Neurosurgery, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Oliver Schnell
- Department of Neurosurgery, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Irina Mader
- Specialist Centre for Radiology, Schoen Clinic, Vogtareuth, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Harald Binder
- Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany.,Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Centre, University of Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg im Breisgau, Germany
| | - Horst Urbach
- Department of Neuroradiology, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| | - Dieter Henrik Heiland
- Department of Neurosurgery, Medical Centre, University of Freiburg, Breisacher Str. 64, 79106, Freiburg im Breisgau, Germany.,Faculty of Medicine, University of Freiburg, Breisacher Str. 153, 79110, Freiburg im Breisgau, Germany
| |
Collapse
|
11
|
Nußberger J, Boesel F, Lenz S, Binder H, Hess M. Synthetic observations from deep generative models and binary omics data with limited sample size. Brief Bioinform 2020; 22:5917048. [PMID: 33003196 DOI: 10.1093/bib/bbaa226] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 08/06/2020] [Accepted: 08/21/2020] [Indexed: 01/19/2023] Open
Abstract
Deep generative models can be trained to represent the joint distribution of data, such as measurements of single nucleotide polymorphisms (SNPs) from several individuals. Subsequently, synthetic observations are obtained by drawing from this distribution. This has been shown to be useful for several tasks, such as removal of noise, imputation, for better understanding underlying patterns, or even exchanging data under privacy constraints. Yet, it is still unclear how well these approaches work with limited sample size. We investigate such settings specifically for binary data, e.g. as relevant when considering SNP measurements, and evaluate three frequently employed generative modeling approaches, variational autoencoders (VAEs), deep Boltzmann machines (DBMs) and generative adversarial networks (GANs). This includes conditional approaches, such as when considering gene expression conditional on SNPs. Recovery of pair-wise odds ratios (ORs) is considered as a primary performance criterion. For simulated as well as real SNP data, we observe that DBMs generally can recover structure for up to 300 variables, with a tendency of over-estimating ORs when not carefully tuned. VAEs generally get the direction and relative strength of pairwise relations right, yet with considerable under-estimation of ORs. GANs provide stable results only with larger sample sizes and strong pair-wise relations in the data. Taken together, DBMs and VAEs (in contrast to GANs) appear to be well suited for binary omics data, even at rather small sample sizes. This opens the way for many potential applications where synthetic observations from omics data might be useful.
Collapse
Affiliation(s)
- Jens Nußberger
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Frederic Boesel
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Stefan Lenz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Moritz Hess
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| |
Collapse
|
12
|
Yin B, Balvert M, van der Spek RAA, Dutilh BE, Bohté S, Veldink J, Schönhuth A. Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. Bioinformatics 2020; 35:i538-i547. [PMID: 31510706 PMCID: PMC6612814 DOI: 10.1093/bioinformatics/btz369] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype–phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Results Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype–phenotype association in whole genome-sized data. Availability and implementation Our code will be available on Github, together with a synthetic dataset (https://github.com/byin-cwi/ALS-Deeplearning). The data used in this study is available to bona-fide researchers upon request. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bojian Yin
- Centrum Wiskunde & Informatica, Life Sciences & Health, XG Amsterdam, The Netherlands
| | - Marleen Balvert
- Centrum Wiskunde & Informatica, Life Sciences & Health, XG Amsterdam, The Netherlands.,Theoretical Biology & Bioinformatics, Utrecht University, JE Utrecht, The Netherlands
| | - Rick A A van der Spek
- Department of Neurology, Brain Center Rudolf Magnus University Medical Center Utrecht, Utrecht, The Netherlands
| | - Bas E Dutilh
- Theoretical Biology & Bioinformatics, Utrecht University, JE Utrecht, The Netherlands
| | - Sander Bohté
- Centrum Wiskunde & Informatica, Life Sciences & Health, XG Amsterdam, The Netherlands
| | - Jan Veldink
- Department of Neurology, Brain Center Rudolf Magnus University Medical Center Utrecht, Utrecht, The Netherlands
| | - Alexander Schönhuth
- Centrum Wiskunde & Informatica, Life Sciences & Health, XG Amsterdam, The Netherlands.,Theoretical Biology & Bioinformatics, Utrecht University, JE Utrecht, The Netherlands
| |
Collapse
|
13
|
Wang H, Li G. Extreme learning machine Cox model for high-dimensional survival analysis. Stat Med 2019; 38:2139-2156. [PMID: 30632193 PMCID: PMC6498851 DOI: 10.1002/sim.8090] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/11/2018] [Accepted: 12/12/2018] [Indexed: 11/07/2022]
Abstract
Some interesting recent studies have shown that neural network models are useful alternatives in modeling survival data when the assumptions of a classical parametric or semiparametric survival model such as the Cox (1972) model are seriously violated. However, to the best of our knowledge, the plausibility of adapting the emerging extreme learning machine (ELM) algorithm for single-hidden-layer feedforward neural networks to survival analysis has not been explored. In this paper, we present a kernel ELM Cox model regularized by an L0 -based broken adaptive ridge (BAR) penalization method. Then, we demonstrate that the resulting method, referred to as ELMCoxBAR, can outperform some other state-of-art survival prediction methods such as L1 - or L2 -regularized Cox regression, random survival forest with various splitting rules, and boosted Cox model, in terms of its predictive performance using both simulated and real world datasets. In addition to its good predictive performance, we illustrate that the proposed method has a key computational advantage over the above competing methods in terms of computation time efficiency using an a real-world ultra-high-dimensional survival data.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Gang Li
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, California
| |
Collapse
|
14
|
Wang H, Zhou L. SurvELM: An R package for high dimensional survival analysis with extreme learning machine. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.07.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Mechanisms and modulators of cognitive training gain transfer in cognitively healthy aging: study protocol of the AgeGain study. Trials 2018; 19:337. [PMID: 29945638 PMCID: PMC6020358 DOI: 10.1186/s13063-018-2688-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 05/15/2018] [Indexed: 11/19/2022] Open
Abstract
Background Cognitively healthy older people can increase their performance in cognitive tasks through training. However, training effects are mostly limited to the trained task; thus, training effects only poorly transfer to untrained tasks or other contexts, which contributes to reduced adaptation abilities in aging. Stabilizing transfer capabilities in aging would increase the chance of persistent high performance in activities of daily living including longer independency, and prolonged active participation in social life. The trial AgeGain aims at elaborating the physiological brain mechanisms of transfer in aging and supposed major modulators of transfer capability, especially physical activity, cerebral vascular lesions, and amyloid burden. Methods This 4-year interventional, multicenter, phase 2a cognitive and physical training study will enroll 237 cognitively healthy older subjects in four recruiting centers. The primary endpoint of this trial is the prediction of transfer of cognitive training gains. Secondary endpoints are the structural connectivity of the corpus callosum, Default Mode Network activity, brain-derived neurotrophic factors, motor fitness, and maximal oxygen uptake. Discussion Cognitive transfer allows making use of cognitive training gains in everyday life. Thus, maintenance of transfer capability with aging increases the chance of persistent self-guidance and prolonged active participation in social life, which may support a good quality of life. The AgeGain study aims at identifying older people who will most benefit from cognitive training. It will increase the understanding of the neurobiological mechanisms of transfer in aging and will help in determining the impact of physical activity and sport as well as pathologic factors (such as cerebrovascular disease and amyloid load) on transfer capability. Trial registration German Clinical Trials Register (DRKS), ID: DRKS00013077. Registered on 19 November 2017. Electronic supplementary material The online version of this article (10.1186/s13063-018-2688-2) contains supplementary material, which is available to authorized users.
Collapse
|
16
|
Heterogeneity Analysis and Diagnosis of Complex Diseases Based on Deep Learning Method. Sci Rep 2018; 8:6155. [PMID: 29670206 PMCID: PMC5906634 DOI: 10.1038/s41598-018-24588-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 04/05/2018] [Indexed: 12/26/2022] Open
Abstract
Understanding genetic mechanism of complex diseases is a serious challenge. Existing methods often neglect the heterogeneity phenomenon of complex diseases, resulting in lack of power or low reproducibility. Addressing heterogeneity when detecting epistatic single nucleotide polymorphisms (SNPs) can enhance the power of association studies and improve prediction performance of complex diseases diagnosis. In this study, we propose a three-stage framework including epistasis detection, clustering and prediction to address both epistasis and heterogeneity of complex diseases based on deep learning method. The epistasis detection stage applies a multi-objective optimization method to find several candidate sets of epistatic SNPs which contribute to different subtypes of complex diseases. Then, a K-means clustering algorithm is used to define subtypes of the case group. Finally, a deep learning model has been trained for disease prediction based on graphics processing unit (GPU). Experimental results on pure and heterogeneous datasets show that our method has potential practicality and can serve as a possible alternative to other methods. Therefore, when epistasis and heterogeneity exist at the same time, our method is especially suitable for diagnosis of complex diseases.
Collapse
|
17
|
|