1
|
van Hilten A, van Rooij J, Ikram MA, Niessen WJ, van Meurs JBJ, Roshchupkin GV. Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data. NPJ Syst Biol Appl 2024; 10:81. [PMID: 39095438 PMCID: PMC11297229 DOI: 10.1038/s41540-024-00405-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Jeroen van Rooij
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - M Arfan Ikram
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Wiro J Niessen
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Orthopaedics and Sports Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
2
|
Delgado AB, Tylden ES, Lukic M, Moi L, Busund LTR, Lund E, Olsen KS. Cohort profile: The Clinical and Multi-omic (CAMO) cohort, part of the Norwegian Women and Cancer (NOWAC) study. PLoS One 2023; 18:e0281218. [PMID: 36745618 PMCID: PMC9901780 DOI: 10.1371/journal.pone.0281218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 01/18/2023] [Indexed: 02/07/2023] Open
Abstract
INTRODUCTION Breast cancer is the most common cancer worldwide and the leading cause of cancer related deaths among women. The high incidence and mortality of breast cancer calls for improved prevention, diagnostics, and treatment, including identification of new prognostic and predictive biomarkers for use in precision medicine. MATERIAL AND METHODS With the aim of compiling a cohort amenable to integrative study designs, we collected detailed epidemiological and clinical data, blood samples, and tumor tissue from a subset of participants from the prospective, population-based Norwegian Women and Cancer (NOWAC) study. These study participants were diagnosed with invasive breast cancer in North Norway before 2013 according to the Cancer Registry of Norway and constitute the Clinical and Multi-omic (CAMO) cohort. Prospectively collected questionnaire data on lifestyle and reproductive factors and blood samples were extracted from the NOWAC study, clinical and histopathological data were manually curated from medical records, and archived tumor tissue collected. RESULTS The lifestyle and reproductive characteristics of the study participants in the CAMO cohort (n = 388) were largely similar to those of the breast cancer patients in NOWAC (n = 10 356). The majority of the cancers in the CAMO cohort were tumor grade 2 and of the luminal A subtype. Approx. 80% were estrogen receptor positive, 13% were HER2 positive, and 12% were triple negative breast cancers. Lymph node metastases were present in 31% at diagnosis. The epidemiological dataset in the CAMO cohort is complemented by mRNA, miRNA, and metabolomics analyses in plasma, as well as miRNA profiling in tumor tissue. Additionally, histological analyses at the level of proteins and miRNAs in tumor tissue are currently ongoing. CONCLUSION The CAMO cohort provides data suitable for epidemiological, clinical, molecular, and multi-omics investigations, thereby enabling a systems epidemiology approach to translational breast cancer research.
Collapse
Affiliation(s)
- André Berli Delgado
- Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
- * E-mail:
| | - Eline Sol Tylden
- Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Marko Lukic
- Department of Community Medicine, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Line Moi
- Department of Clinical Pathology, University Hospital of North Norway, Tromsø, Norway
| | - Lill-Tove Rasmussen Busund
- Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
- Department of Clinical Pathology, University Hospital of North Norway, Tromsø, Norway
| | - Eiliv Lund
- Department of Community Medicine, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Karina Standahl Olsen
- Department of Community Medicine, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
3
|
Krum-Hansen S, Standahl Olsen K, Anderssen E, Frantzen JO, Lund E, Paulssen RH. Associations of breast cancer related exposures and gene expression profiles in normal breast tissue-The Norwegian Women and Cancer normal breast tissue study. Cancer Rep (Hoboken) 2023; 6:e1777. [PMID: 36617746 PMCID: PMC10075301 DOI: 10.1002/cnr2.1777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Normal breast tissue is utilized in tissue-based studies of breast carcinogenesis. While gene expression in breast tumor tissue is well explored, our knowledge of transcriptomic signatures in normal breast tissue is still incomplete. The aim of this study was to investigate variability of gene expression in a large sample of normal breast tissue biopsies, according to breast cancer related exposures (obesity, smoking, alcohol, hormone therapy, and parity). METHODS We analyzed gene expression profiles from 311 normal breast tissue biopsies from cancer-free, post-menopausal women, using Illumina bead chip arrays. Principal component analysis and K-means clustering was used for initial analysis of the dataset. The association of exposures and covariates with gene expression was determined using linear models for microarrays. RESULTS Heterogeneity of the breast tissue and cell composition had the strongest influence on gene expression profiles. After adjusting for cell composition, obesity, smoking, and alcohol showed the highest numbers of associated genes and pathways, whereas hormone therapy and parity were associated with negligible gene expression differences. CONCLUSION Our results provide insight into associations between major exposures and gene expression profiles and provide an informative baseline for improved understanding of exposure-related molecular events in normal breast tissue of cancer-free, post-menopausal women.
Collapse
Affiliation(s)
- Sanda Krum-Hansen
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway.,Department of Hematology and Oncology, Stavanger University Hospital, Stavanger, Norway
| | - Karina Standahl Olsen
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| | - Endre Anderssen
- Genomics Support Center Tromsø (GSCT), UiT The Arctic University of Norway, Tromsø, Norway
| | - Jan Ole Frantzen
- Narvik Hospital, University Hospital of North Norway, Narvik, Norway
| | - Eiliv Lund
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| | - Ruth H Paulssen
- Genomics Support Center Tromsø (GSCT), UiT The Arctic University of Norway, Tromsø, Norway.,Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
4
|
Immune cell-specific smoking-related expression characteristics are revealed by re-analysis of transcriptomes from the CEDAR cohort. Cent Eur J Immunol 2022; 47:246-259. [PMID: 36817262 PMCID: PMC9896985 DOI: 10.5114/ceji.2022.120618] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction Smoking is known to affect whole-blood expression and methylation profiles. Although whole-genome methylation studies indicated that effects observed in blood may be driven by changes within leukocyte subtypes, these phenomena have not been explored using expression profiling. Material and methods This study reanalyzed data from the Correlated Expression and Disease Association Research (CEDAR) patient cohort recruited by Momozawa et al. (E-MTAB-6667). Data from gene expression profiling of immunomagnetically sorted CD4+, CD8+, CD14+, CD15+, and CD19+ cells were processed. Differential expression analyses were conducted in each immune cell type, followed by gene ontology analysis and supplementary investigations. Results Ninety-four differentially expressed genes were found (CD8+ n = 58, CD14+ n = 20, CD4+ n = 14, CD19+ n = 2). Two key smoking-related genes were overexpressed in specific cell types: LRRN3 (CD4+, CD8+) and MMP25 (CD8+, CD14+). In CD4+ cells smoking was associated with reduced expression of the NK cell receptor KLRB1, suggesting CD4+ subpopulation shifts and differences in interferon signaling (reduced IRF1 and IL18RAP in smokers). Key results and their integration with an immune protein-protein interaction network revealed that smoking influences integrins in CD8+ cells (ITGB7, ITGAL, ITGAM, ITGB2). C-type lectin CLEC4A was reduced in CD8+ cells and CLEC10A was increased in CD14+ cells from smokers; moreover, CLEC5A (CD8+), CLEC7A (CD8+) and CLEC9A (CD19+) were related to smoking in supplementary analyses. CD14+ cells from smokers exhibited overexpression of LDLR and the formyl peptide receptor FPR3. Conclusions Smoking specifically alters vital immune regulation genes in lymphocyte subtypes, especially CD4+, CD8+ and CD14+ cells.
Collapse
|
5
|
Jareid M, Snapkov I, Holden M, Busund LTR, Lund E, Nøst TH. The blood transcriptome prior to ovarian cancer diagnosis: A case-control study in the NOWAC postgenome cohort. PLoS One 2021; 16:e0256442. [PMID: 34449791 PMCID: PMC8396762 DOI: 10.1371/journal.pone.0256442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/06/2021] [Indexed: 11/22/2022] Open
Abstract
Epithelial ovarian cancer (EOC) has a 5-year relative survival of 50%, partly because markers of early-stage disease are not available in current clinical diagnostics. The aim of the present study was to investigate whether EOC is associated with transcriptional profiles in blood collected up to 7 years before diagnosis. For this, we used RNA-stabilized whole blood, which contains circulating immune cells, from a sample of EOC cases from the population-based Norwegian Women and Cancer (NOWAC) postgenome cohort. We explored case-control differences in gene expression in all EOC (66 case-control pairs), as well as associations between gene expression and metastatic EOC (56 pairs), serous EOC (45 pairs, 44 of which were metastatic), and interval from blood sample collection to diagnosis (≤3 or >3 years; 34 and 31 pairs, respectively). Lastly, we assessed differential expression of genes associated with EOC in published functional genomics studies that used blood samples collected from newly diagnosed women. After adjustment for multiple testing, this nested case-control study revealed no significant case-control differences in gene expression in all EOC (false discovery rate q>0.96). With the exception of a few probes, the log2 fold change values obtained in gene-wise linear models were below ±0.2. P-values were lowest in analyses of metastatic EOC (80% of which were serous EOC). No common transcriptional profile was indicated by interval to diagnosis; when comparing the 100 genes with the lowest p-values in gene-wise tests in samples collected ≤3 and >3 years before EOC diagnosis, no overlap in these genes was observed. Among 86 genes linked to ovarian cancer in previous publications, our data contained expression values for 42, and of these, tests of LIME1, GPR162, STAB1, and SKAP1, resulted in unadjusted p<0.05. Although limited by sample size, our findings indicated less variation in blood gene expression between women with similar tumor characteristics.
Collapse
Affiliation(s)
- Mie Jareid
- Faculty of Health Sciences, Department of Community Medicine, UiT – The Arctic University of Norway, Tromsø, Norway
- * E-mail:
| | - Igor Snapkov
- Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway
| | | | - Lill-Tove Rasmussen Busund
- Faculty of Health Sciences, Department of Medical Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| | - Eiliv Lund
- Faculty of Health Sciences, Department of Community Medicine, UiT – The Arctic University of Norway, Tromsø, Norway
- Cancer Registry of Norway, Oslo, Norway
| | - Therese Haugdahl Nøst
- Faculty of Health Sciences, Department of Community Medicine, UiT – The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|