1
|
Sergouniotis PI, Diakite A, Gaurav K, Birney E, Fitzgerald T. Autoencoder-based phenotyping of ophthalmic images highlights genetic loci influencing retinal morphology and provides informative biomarkers. Bioinformatics 2024; 41:btae732. [PMID: 39657956 PMCID: PMC11751639 DOI: 10.1093/bioinformatics/btae732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 10/08/2024] [Accepted: 12/11/2024] [Indexed: 12/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have been remarkably successful in identifying associations between genetic variants and imaging-derived phenotypes. To date, the main focus of these analyses has been on established, clinically-used imaging features. We sought to investigate if deep learning approaches can detect more nuanced patterns of image variability. RESULTS We used an autoencoder to represent retinal optical coherence tomography (OCT) images from 31 135 UK Biobank participants. For each subject, we obtained a 64-dimensional vector representing features of retinal structure. GWAS of these autoencoder-derived imaging parameters identified 118 statistically significant loci; 41 of these associations were also significant in a replication study. These loci encompassed variants previously linked with retinal thickness measurements, ophthalmic disorders, and/or neurodegenerative conditions. Notably, the generated retinal phenotypes were found to contribute to predictive models for glaucoma and cardiovascular disorders. Overall, we demonstrate that self-supervised phenotyping of OCT images enhances the discoverability of genetic factors influencing retinal morphology and provides epidemiologically informative biomarkers. AVAILABILITY AND IMPLEMENTATION Code and data links available at https://github.com/tf2/autoencoder-oct.
Collapse
Affiliation(s)
- Panagiotis I Sergouniotis
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9NT, United Kingdom
- Manchester Centre for Genomic Medicine, Saint Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester M13 9WL, United Kingdom
- Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester M13 9WL, United Kingdom
| | - Adam Diakite
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Kumar Gaurav
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Ewan Birney
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
2
|
Rakowski A, Monti R, Lippert C. TransferGWAS of T1-weighted brain MRI data from UK Biobank. PLoS Genet 2024; 20:e1011332. [PMID: 39671448 DOI: 10.1371/journal.pgen.1011332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 12/31/2024] [Accepted: 11/07/2024] [Indexed: 12/15/2024] Open
Abstract
Genome-wide association studies (GWAS) traditionally analyze single traits, e.g., disease diagnoses or biomarkers. Nowadays, large-scale cohorts such as UK Biobank (UKB) collect imaging data with sample sizes large enough to perform genetic association testing. Typical approaches to GWAS on high-dimensional modalities extract predefined features from the data, e.g., volumes of regions of interest. This limits the scope of such studies to predefined traits and can ignore novel patterns present in the data. TransferGWAS employs deep neural networks (DNNs) to extract low-dimensional representations of imaging data for GWAS, eliminating the need for predefined biomarkers. Here, we apply transferGWAS on brain MRI data from UKB. We encoded 36, 311 T1-weighted brain magnetic resonance imaging (MRI) scans using DNN models trained on MRI scans from the Alzheimer's Disease Neuroimaging Initiative, and on natural images from the ImageNet dataset, and performed a multivariate GWAS on the resulting features. We identified 289 independent loci, associated among others with bone density, brain, or cardiovascular traits, and 11 regions having no previously reported associations. We fitted polygenic scores (PGS) of the deep features, which improved predictions of bone mineral density and several other traits in a multi-PGS setting, and computed genetic correlations with selected phenotypes, which pointed to novel links between diffusion MRI traits and type 2 diabetes. Overall, our findings provided evidence that features learned with DNN models can uncover additional heritable variability in the human brain beyond the predefined measures, and link them to a range of non-brain phenotypes.
Collapse
Affiliation(s)
- Alexander Rakowski
- Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany
| | - Remo Monti
- Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Christoph Lippert
- Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, New York, New York, United States of America
| |
Collapse
|
3
|
Friedman SF, Moran GE, Rakic M, Phillipakis A. Genetic Architectures of Medical Images Revealed by Registration of Multiple Modalities. Bioinform Biol Insights 2024; 18:11779322241282489. [PMID: 39372505 PMCID: PMC11450573 DOI: 10.1177/11779322241282489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 08/16/2024] [Indexed: 10/08/2024] Open
Abstract
The advent of biobanks with vast quantities of medical imaging and paired genetic measurements creates huge opportunities for a new generation of genotype-phenotype association studies. However, disentangling biological signals from the many sources of bias and artifacts remains difficult. Using diverse medical images and time-series (ie, magnetic resonance imagings [MRIs], electrocardiograms [ECGs], and dual-energy X-ray absorptiometries [DXAs]), we show how registration, both spatial and temporal, guided by domain knowledge or learned de novo, helps uncover biological information. A multimodal autoencoder comparison framework quantifies and characterizes how registration affects the representations that unsupervised and self-supervised encoders learn. In this study we (1) train autoencoders before and after registration with nine diverse types of medical image, (2) demonstrate how neural network-based methods (VoxelMorph, DeepCycle, and DropFuse) can effectively learn registrations allowing for more flexible and efficient processing than is possible with hand-crafted registration techniques, and (3) conduct exhaustive phenotypic screening, comprised of millions of statistical tests, to quantify how registration affects the generalizability of learned representations. Genome- and phenome-wide association studies (GWAS and PheWAS) uncover significantly more associations with registered modality representations than with equivalently trained and sized representations learned from native coordinate spaces. Specifically, registered PheWAS yielded 61 more disease associations for ECGs, 53 more disease associations for cardiac MRIs, and 10 more disease associations for brain MRIs. Registration also yields significant increases in the coefficient of determination when regressing continuous phenotypes (eg, 0.36 ± 0.01 with ECGs and 0.11 ± 0.02 for DXA scans). Our findings reveal the crucial role registration plays in enhancing the characterization of physiological states across a broad range of medical imaging data types. Importantly, this finding extends to more flexible types of registration, such as the cross-modal and the circular mapping methods presented here.
Collapse
Affiliation(s)
| | | | - Marianne Rakic
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Anthony Phillipakis
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
4
|
Mukherjee S, McCaw ZR, Pei J, Merkoulovitch A, Soare T, Tandon R, Amar D, Somineni H, Klein C, Satapati S, Lloyd D, Probert C, Insitro Research Team, Koller D, O’Dushlaine C, Karaletsos T. EmbedGEM: a framework to evaluate the utility of embeddings for genetic discovery. BIOINFORMATICS ADVANCES 2024; 4:vbae135. [PMID: 39664859 PMCID: PMC11632179 DOI: 10.1093/bioadv/vbae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/13/2024] [Indexed: 12/13/2024]
Abstract
Summary Machine learning-derived embeddings are a compressed representation of high content data modalities. Embeddings can capture detailed information about disease states and have been qualitatively shown to be useful in genetic discovery. Despite their promise, embeddings have a major limitation: it is unclear if genetic variants associated with embeddings are relevant to the disease or trait of interest. In this work, we describe EmbedGEM (Embedding Genetic Evaluation Methods), a framework to systematically evaluate the utility of embeddings in genetic discovery. EmbedGEM focuses on comparing embeddings along two axes: heritability and disease relevance. As measures of heritability, we consider the number of genome-wide significant associations and the meanχ 2 statistic at significant loci. For disease relevance, we compute polygenic risk scores for each embedding principal component, then evaluate their association with high-confidence disease or trait labels in a held-out evaluation patient set. While our development of EmbedGEM is motivated by embeddings, the approach is generally applicable to multivariate traits and can readily be extended to accommodate additional metrics along the evaluation axes. We demonstrate EmbedGEM's utility by evaluating embeddings and multivariate traits in two separate datasets: (i) a synthetic dataset simulated to demonstrate the ability of the framework to correctly rank traits based on their heritability and disease relevance and (ii) a real data from the UK Biobank, including metabolic and liver-related traits. Importantly, we show that greater disease relevance does not automatically follow from greater heritability. Availability and implementation https://github.com/insitro/EmbedGEM.
Collapse
Affiliation(s)
- Sumit Mukherjee
- Insitro Inc, South San Francisco, California 94080, United States
| | - Zachary R McCaw
- Insitro Inc, South San Francisco, California 94080, United States
| | - Jingwen Pei
- Insitro Inc, South San Francisco, California 94080, United States
| | | | - Tom Soare
- Insitro Inc, South San Francisco, California 94080, United States
| | - Raghav Tandon
- Center for Machine Learning, Georgia Institute of Technology, Georgia 30332, United States
| | - David Amar
- Insitro Inc, South San Francisco, California 94080, United States
| | - Hari Somineni
- Insitro Inc, South San Francisco, California 94080, United States
| | - Christoph Klein
- Insitro Inc, South San Francisco, California 94080, United States
| | | | - David Lloyd
- Insitro Inc, South San Francisco, California 94080, United States
| | | | | | - Daphne Koller
- Insitro Inc, South San Francisco, California 94080, United States
| | - Colm O’Dushlaine
- Insitro Inc, South San Francisco, California 94080, United States
| | | |
Collapse
|
5
|
Yun T, Cosentino J, Behsaz B, McCaw ZR, Hill D, Luben R, Lai D, Bates J, Yang H, Schwantes-An TH, Zhou Y, Khawaja AP, Carroll A, Hobbs BD, Cho MH, McLean CY, Hormozdiari F. Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction. Nat Genet 2024; 56:1604-1613. [PMID: 38977853 PMCID: PMC11319202 DOI: 10.1038/s41588-024-01831-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 06/13/2024] [Indexed: 07/10/2024]
Abstract
Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD-spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.
Collapse
Affiliation(s)
| | | | | | - Zachary R McCaw
- Google Research, Mountain View, CA, USA
- Insitro, South San Francisco, CA, USA
| | - Davin Hill
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Robert Luben
- NIHR Biomedical Research Centre at Moorfields Eye Hospital and University College London (UCL) Institute of Ophthalmology, London, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - John Bates
- Verily Life Sciences, South San Francisco, CA, USA
| | | | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Anthony P Khawaja
- NIHR Biomedical Research Centre at Moorfields Eye Hospital and University College London (UCL) Institute of Ophthalmology, London, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | | | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
6
|
Zhao B, Li Y, Fan Z, Wu Z, Shu J, Yang X, Yang Y, Wang X, Li B, Wang X, Copana C, Yang Y, Lin J, Li Y, Stein JL, O'Brien JM, Li T, Zhu H. Eye-brain connections revealed by multimodal retinal and brain imaging genetics. Nat Commun 2024; 15:6064. [PMID: 39025851 PMCID: PMC11258354 DOI: 10.1038/s41467-024-50309-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 07/02/2024] [Indexed: 07/20/2024] Open
Abstract
The retina, an anatomical extension of the brain, forms physiological connections with the visual cortex of the brain. Although retinal structures offer a unique opportunity to assess brain disorders, their relationship to brain structure and function is not well understood. In this study, we conducted a systematic cross-organ genetic architecture analysis of eye-brain connections using retinal and brain imaging endophenotypes. We identified novel phenotypic and genetic links between retinal imaging biomarkers and brain structure and function measures from multimodal magnetic resonance imaging (MRI), with many associations involving the primary visual cortex and visual pathways. Retinal imaging biomarkers shared genetic influences with brain diseases and complex traits in 65 genomic regions, with 18 showing genetic overlap with brain MRI traits. Mendelian randomization suggests bidirectional genetic causal links between retinal structures and neurological and neuropsychiatric disorders, such as Alzheimer's disease. Overall, our findings reveal the genetic basis for eye-brain connections, suggesting that retinal images can help uncover genetic risk factors for brain disorders and disease-related changes in intracranial structure and function.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA.
- Applied Mathematics and Computational Science Graduate Group, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Center for AI and Data Science for Integrated Diagnostics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Penn Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Population Aging Research Center, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Yujue Li
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Zirui Fan
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Zhenyi Wu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Xiaochen Yang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Yilin Yang
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Xifeng Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Bingxuan Li
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Xiyao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Carlos Copana
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Yue Yang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jinjie Lin
- Yale School of Management, Yale University, New Haven, CT, 06511, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Joan M O'Brien
- Scheie Eye Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Penn Medicine Center for Ophthalmic Genetics in Complex Diseases, Philadelphia, PA, 19104, USA
| | - Tengfei Li
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Biomedical Research Imaging Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
7
|
Ashayeri H, Sobhi N, Pławiak P, Pedrammehr S, Alizadehsani R, Jafarizadeh A. Transfer Learning in Cancer Genetics, Mutation Detection, Gene Expression Analysis, and Syndrome Recognition. Cancers (Basel) 2024; 16:2138. [PMID: 38893257 PMCID: PMC11171544 DOI: 10.3390/cancers16112138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 05/30/2024] [Accepted: 06/01/2024] [Indexed: 06/21/2024] Open
Abstract
Artificial intelligence (AI), encompassing machine learning (ML) and deep learning (DL), has revolutionized medical research, facilitating advancements in drug discovery and cancer diagnosis. ML identifies patterns in data, while DL employs neural networks for intricate processing. Predictive modeling challenges, such as data labeling, are addressed by transfer learning (TL), leveraging pre-existing models for faster training. TL shows potential in genetic research, improving tasks like gene expression analysis, mutation detection, genetic syndrome recognition, and genotype-phenotype association. This review explores the role of TL in overcoming challenges in mutation detection, genetic syndrome detection, gene expression, or phenotype-genotype association. TL has shown effectiveness in various aspects of genetic research. TL enhances the accuracy and efficiency of mutation detection, aiding in the identification of genetic abnormalities. TL can improve the diagnostic accuracy of syndrome-related genetic patterns. Moreover, TL plays a crucial role in gene expression analysis in order to accurately predict gene expression levels and their interactions. Additionally, TL enhances phenotype-genotype association studies by leveraging pre-trained models. In conclusion, TL enhances AI efficiency by improving mutation prediction, gene expression analysis, and genetic syndrome detection. Future studies should focus on increasing domain similarities, expanding databases, and incorporating clinical data for better predictions.
Collapse
Affiliation(s)
- Hamidreza Ashayeri
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran;
| | - Navid Sobhi
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran; (N.S.); (A.J.)
| | - Paweł Pławiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Krakow, Poland
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100 Gliwice, Poland
| | - Siamak Pedrammehr
- Faculty of Design, Tabriz Islamic Art University, Tabriz 5164736931, Iran;
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Burwood, VIC 3216, Australia;
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Burwood, VIC 3216, Australia;
| | - Ali Jafarizadeh
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran; (N.S.); (A.J.)
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran
| |
Collapse
|
8
|
Xie Z, Zhang T, Kim S, Lu J, Zhang W, Lin CH, Wu MR, Davis A, Channa R, Giancardo L, Chen H, Wang S, Chen R, Zhi D. iGWAS: Image-based genome-wide association of self-supervised deep phenotyping of retina fundus images. PLoS Genet 2024; 20:e1011273. [PMID: 38728357 PMCID: PMC11111076 DOI: 10.1371/journal.pgen.1011273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/22/2024] [Accepted: 04/25/2024] [Indexed: 05/12/2024] Open
Abstract
Existing imaging genetics studies have been mostly limited in scope by using imaging-derived phenotypes defined by human experts. Here, leveraging new breakthroughs in self-supervised deep representation learning, we propose a new approach, image-based genome-wide association study (iGWAS), for identifying genetic factors associated with phenotypes discovered from medical images using contrastive learning. Using retinal fundus photos, our model extracts a 128-dimensional vector representing features of the retina as phenotypes. After training the model on 40,000 images from the EyePACS dataset, we generated phenotypes from 130,329 images of 65,629 British White participants in the UK Biobank. We conducted GWAS on these phenotypes and identified 14 loci with genome-wide significance (p<5×10-8 and intersection of hits from left and right eyes). We also did GWAS on the retina color, the average color of the center region of the retinal fundus photos. The GWAS of retina colors identified 34 loci, 7 are overlapping with GWAS of raw image phenotype. Our results establish the feasibility of this new framework of genomic study based on self-supervised phenotyping of medical images.
Collapse
Affiliation(s)
- Ziqian Xie
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Tao Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Sangbae Kim
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Jiaxiong Lu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Wanheng Zhang
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Cheng-Hui Lin
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Man-Ru Wu
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Alexander Davis
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Roomasa Channa
- Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Luca Giancardo
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Han Chen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Sui Wang
- Department of Ophthalmology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
9
|
Sigala RE, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M, Prokopenko I, Mahdi A, Demirkan A. Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
Affiliation(s)
- Rafaella E. Sigala
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Vasiliki Lagou
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Aleksey Shmeliov
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Sara Atito
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Samaneh Kouchaki
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Muhammad Awais
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Inga Prokopenko
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| | - Adam Mahdi
- Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK;
| | - Ayse Demirkan
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| |
Collapse
|
10
|
Yun T, Cosentino J, Behsaz B, McCaw ZR, Hill D, Luben R, Lai D, Bates J, Yang H, Schwantes-An TH, Zhou Y, Khawaja AP, Carroll A, Hobbs BD, Cho MH, McLean CY, Hormozdiari F. Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.04.28.23289285. [PMID: 37163049 PMCID: PMC10168505 DOI: 10.1101/2023.04.28.23289285] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.
Collapse
Affiliation(s)
| | | | | | | | - Davin Hill
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 94304, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Robert Luben
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - John Bates
- Verily Life Sciences, South San Francisco, CA 94080, USA
| | | | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | - Anthony P. Khawaja
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | | | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | | | | |
Collapse
|
11
|
Zhu C, Baumgarten N, Wu M, Wang Y, Das AP, Kaur J, Ardakani FB, Duong TT, Pham MD, Duda M, Dimmeler S, Yuan T, Schulz MH, Krishnan J. CVD-associated SNPs with regulatory potential reveal novel non-coding disease genes. Hum Genomics 2023; 17:69. [PMID: 37491351 PMCID: PMC10369730 DOI: 10.1186/s40246-023-00513-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 07/12/2023] [Indexed: 07/27/2023] Open
Abstract
BACKGROUND Cardiovascular diseases (CVDs) are the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) appearing in non-coding genomic regions in CVDs. The SNPs may alter gene expression by modifying transcription factor (TF) binding sites and lead to functional consequences in cardiovascular traits or diseases. To understand the underlying molecular mechanisms, it is crucial to identify which variations are involved and how they affect TF binding. METHODS The SNEEP (SNP exploration and analysis using epigenomics data) pipeline was used to identify regulatory SNPs, which alter the binding behavior of TFs and link GWAS SNPs to their potential target genes for six CVDs. The human-induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs), monoculture cardiac organoids (MCOs) and self-organized cardiac organoids (SCOs) were used in the study. Gene expression, cardiomyocyte size and cardiac contractility were assessed. RESULTS By using our integrative computational pipeline, we identified 1905 regulatory SNPs in CVD GWAS data. These were associated with hundreds of genes, half of them non-coding RNAs (ncRNAs), suggesting novel CVD genes. We experimentally tested 40 CVD-associated non-coding RNAs, among them RP11-98F14.11, RPL23AP92, IGBP1P1, and CTD-2383I20.1, which were upregulated in hiPSC-CMs, MCOs and SCOs under hypoxic conditions. Further experiments showed that IGBP1P1 depletion rescued expression of hypertrophic marker genes, reduced hypoxia-induced cardiomyocyte size and improved hypoxia-reduced cardiac contractility in hiPSC-CMs and MCOs. CONCLUSIONS IGBP1P1 is a novel ncRNA with key regulatory functions in modulating cardiomyocyte size and cardiac function in our disease models. Our data suggest ncRNA IGBP1P1 as a potential therapeutic target to improve cardiac function in CVDs.
Collapse
Affiliation(s)
- Chaonan Zhu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Meiqian Wu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Yue Wang
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Arka Provo Das
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Jaskiran Kaur
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Fatemeh Behjati Ardakani
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Thanh Thuy Duong
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Minh Duc Pham
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Maria Duda
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Stefanie Dimmeler
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Ting Yuan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
| | - Jaya Krishnan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| |
Collapse
|
12
|
Zhao B, Li Y, Fan Z, Wu Z, Shu J, Yang X, Yang Y, Wang X, Li B, Wang X, Copana C, Yang Y, Lin J, Li Y, Stein JL, O'Brien JM, Li T, Zhu H. Eye-brain connections revealed by multimodal retinal and brain imaging genetics in the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.16.23286035. [PMID: 36824893 PMCID: PMC9949187 DOI: 10.1101/2023.02.16.23286035] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
As an anatomical extension of the brain, the retina of the eye is synaptically connected to the visual cortex, establishing physiological connections between the eye and the brain. Despite the unique opportunity retinal structures offer for assessing brain disorders, less is known about their relationship to brain structure and function. Here we present a systematic cross-organ genetic architecture analysis of eye-brain connections using retina and brain imaging endophenotypes. Novel phenotypic and genetic links were identified between retinal imaging biomarkers and brain structure and function measures derived from multimodal magnetic resonance imaging (MRI), many of which were involved in the visual pathways, including the primary visual cortex. In 65 genomic regions, retinal imaging biomarkers shared genetic influences with brain diseases and complex traits, 18 showing more genetic overlaps with brain MRI traits. Mendelian randomization suggests that retinal structures have bidirectional genetic causal links with neurological and neuropsychiatric disorders, such as Alzheimer's disease. Overall, cross-organ imaging genetics reveals a genetic basis for eye-brain connections, suggesting that the retinal images can elucidate genetic risk factors for brain disorders and disease-related changes in intracranial structure and function.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Yujue Li
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Zirui Fan
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Zhenyi Wu
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Xiaochen Yang
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Yilin Yang
- Department of Computer and Information Science and Electrical and Systems Engineering, School of Engineering & Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Xifeng Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bingxuan Li
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Xiyao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Carlos Copana
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Yue Yang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jinjie Lin
- Yale School of Management, Yale University, New Haven, CT 06511, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L. Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Joan M. O'Brien
- Scheie Eye Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Penn Medicine Center for Ophthalmic Genetics in Complex Diseases, PA, 19104, USA
| | - Tengfei Li
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Biomedical Research Imaging Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|