1
|
Green space exposure and blood DNA methylation at birth and in childhood - A multi-cohort study. ENVIRONMENT INTERNATIONAL 2024; 188:108684. [PMID: 38776651 DOI: 10.1016/j.envint.2024.108684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/21/2024] [Accepted: 04/21/2024] [Indexed: 05/25/2024]
Abstract
Green space exposure has been associated with improved mental, physical and general health. However, the underlying biological mechanisms remain largely unknown. The aim of this study was to investigate the association between green space exposure and cord and child blood DNA methylation. Data from eight European birth cohorts with a total of 2,988 newborns and 1,849 children were used. Two indicators of residential green space exposure were assessed: (i) surrounding greenness (satellite-based Normalized Difference Vegetation Index (NDVI) in buffers of 100 m and 300 m) and (ii) proximity to green space (having a green space ≥ 5,000 m2 within a distance of 300 m). For these indicators we assessed two exposure windows: (i) pregnancy, and (ii) the period from pregnancy to child blood DNA methylation assessment, named as cumulative exposure. DNA methylation was measured with the Illumina 450K or EPIC arrays. To identify differentially methylated positions (DMPs) we fitted robust linear regression models between pregnancy green space exposure and cord blood DNA methylation and between cumulative green space exposure and child blood DNA methylation. Two sensitivity analyses were conducted: (i) without adjusting for cellular composition, and (ii) adjusting for air pollution. Cohort results were combined through fixed-effect inverse variance weighted meta-analyses. Differentially methylated regions (DMRs) were identified from meta-analysed results using the Enmix-combp and DMRcate methods. There was no statistical evidence of pregnancy or cumulative exposures associating with any DMP (False Discovery Rate, FDR, p-value < 0.05). However, surrounding greenness exposure was inversely associated with four DMRs (three in cord blood and one in child blood) annotated to ADAMTS2, KCNQ1DN, SLC6A12 and SDK1 genes. Results did not change substantially in the sensitivity analyses. Overall, we found little evidence of the association between green space exposure and blood DNA methylation. Although we identified associations between surrounding greenness exposure with four DMRs, these findings require replication.
Collapse
|
2
|
Clonal chromosomal mosaicism and loss of chromosome Y in elderly men increase vulnerability for SARS-CoV-2. Commun Biol 2024; 7:202. [PMID: 38374351 PMCID: PMC10876565 DOI: 10.1038/s42003-024-05805-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 01/11/2024] [Indexed: 02/21/2024] Open
Abstract
The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, COVID-19) had an estimated overall case fatality ratio of 1.38% (pre-vaccination), being 53% higher in males and increasing exponentially with age. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, we found 133 cases (1.42%) with detectable clonal mosaicism for chromosome alterations (mCA) and 226 males (5.08%) with acquired loss of chromosome Y (LOY). Individuals with clonal mosaic events (mCA and/or LOY) showed a 54% increase in the risk of COVID-19 lethality. LOY is associated with transcriptomic biomarkers of immune dysfunction, pro-coagulation activity and cardiovascular risk. Interferon-induced genes involved in the initial immune response to SARS-CoV-2 are also down-regulated in LOY. Thus, mCA and LOY underlie at least part of the sex-biased severity and mortality of COVID-19 in aging patients. Given its potential therapeutic and prognostic relevance, evaluation of clonal mosaicism should be implemented as biomarker of COVID-19 severity in elderly people.
Collapse
|
3
|
Epimutation detection in the clinical context: guidelines and a use case from a new Bioconductor package. Epigenetics 2023; 18:2230670. [PMID: 37409354 DOI: 10.1080/15592294.2023.2230670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023] Open
Abstract
Epimutations are rare alterations of the normal DNA methylation pattern at specific loci, which can lead to rare diseases. Methylation microarrays enable genome-wide epimutation detection, but technical limitations prevent their use in clinical settings: methods applied to rare diseases' data cannot be easily incorporated to standard analyses pipelines, while epimutation methods implemented in R packages (ramr) have not been validated for rare diseases. We have developed epimutacions, a Bioconductor package (https://bioconductor.org/packages/release/bioc/html/epimutacions.html). epimutacions implements two previously reported methods and four new statistical approaches to detect epimutations, along with functions to annotate and visualize epimutations. Additionally, we have developed an user-friendly Shiny app to facilitate epimutations detection (https://github.com/isglobal-brge/epimutacionsShiny) to non-bioinformatician users. We first compared the performance of epimutacions and ramr packages using three public datasets with experimentally validated epimutations. Methods in epimutacions had a high performance at low sample sizes and outperformed methods in ramr. Second, we used two general population children cohorts (INMA and HELIX) to determine the technical and biological factors that affect epimutations detection, providing guidelines on how designing the experiments or preprocessing the data. In these cohorts, most epimutations did not correlate with detectable regional gene expression changes. Finally, we exemplified how epimutacions can be used in a clinical context. We run epimutacions in a cohort of children with autism disorder and identified novel recurrent epimutations in candidate genes for autism. Overall, we present epimutacions a new Bioconductor package for incorporating epimutations detection to rare disease diagnosis and provide guidelines for the design and data analyses.
Collapse
|
4
|
Prenatal environmental exposures associated with sex differences in childhood obesity and neurodevelopment. BMC Med 2023; 21:142. [PMID: 37046291 PMCID: PMC10099694 DOI: 10.1186/s12916-023-02815-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 03/06/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Obesity and neurodevelopmental delay are complex traits that often co-occur and differ between boys and girls. Prenatal exposures are believed to influence children's obesity, but it is unknown whether exposures of pregnant mothers can confer a different risk of obesity between sexes, and whether they can affect neurodevelopment. METHODS We analyzed data from 1044 children from the HELIX project, comprising 93 exposures during pregnancy, and clinical, neuropsychological, and methylation data during childhood (5-11 years). Using exposome-wide interaction analyses, we identified prenatal exposures with the highest sexual dimorphism in obesity risk, which were used to create a multiexposure profile. We applied causal random forest to classify individuals into two environments: E1 and E0. E1 consists of a combination of exposure levels where girls have significantly less risk of obesity than boys, as compared to E0, which consists of the remaining combination of exposure levels. We investigated whether the association between sex and neurodevelopmental delay also differed between E0 and E1. We used methylation data to perform an epigenome-wide association study between the environments to see the effect of belonging to E1 or E0 at the molecular level. RESULTS We observed that E1 was defined by the combination of low dairy consumption, non-smokers' cotinine levels in blood, low facility richness, and the presence of green spaces during pregnancy (ORinteraction = 0.070, P = 2.59 × 10-5). E1 was also associated with a lower risk of neurodevelopmental delay in girls, based on neuropsychological tests of non-verbal intelligence (ORinteraction = 0.42, P = 0.047) and working memory (ORinteraction = 0.31, P = 0.02). In line with this, several neurodevelopmental functions were enriched in significant differentially methylated probes between E1 and E0. CONCLUSIONS The risk of obesity can be different for boys and girls in certain prenatal environments. We identified an environment combining four exposure levels that protect girls from obesity and neurodevelopment delay. The combination of single exposures into multiexposure profiles using causal inference can help determine populations at risk.
Collapse
|
5
|
Sex Differences in the Association between Risk of Anterior Cruciate Ligament Rupture and COL5A1 Polymorphisms in Elite Footballers. Genes (Basel) 2022; 14:33. [PMID: 36672775 PMCID: PMC9858943 DOI: 10.3390/genes14010033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/15/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Single-nucleotide polymorphisms (SNPs) in collagen genes are predisposing factors for anterior cruciate ligament (ACL) rupture. Although these events are more frequent in females, the sex-specific risk of reported SNPs has not been evaluated. PURPOSE We aimed to assess the sex-specific risk of historic non-contact ACL rupture considering candidate SNPs in genes previously associated with muscle, tendon, ligament and ACL injury in elite footballers. STUDY DESIGN This was a cohort genetic association study. METHODS Forty-six (twenty-four females) footballers playing for the first team of FC Barcelona (Spain) during the 2020-21 season were included in the study. We evaluated the association between a history of non-contact ACL rupture before July 2022 and 108 selected SNPs, stratified by sex. SNPs with nominally significant associations in one sex were then tested for their interactions with sex on ACL. RESULTS Seven female (29%) and one male (4%) participants had experienced non-contact ACL rupture during their professional football career before the last date of observation. We found a significant association between the rs13946 C/C genotype and ACL injury in women footballers (p = 0.017). No significant associations were found in male footballers. The interaction between rs13946 and sex was significant (p = 0.027). We found that the C-allele of rs13946 was exclusive to one haplotype of five SNPs spanning COL5A1. CONCLUSIONS The present study suggests the role of SNPs in genes encoding for collagens as female risk factors for ACL injury in football players. CLINICAL RELEVANCE The genetic profiling of athletes at high risk of ACL rupture can contribute to sex-specific strategies for injury prevention in footballers.
Collapse
|
6
|
Abstract
Environmental exposures during early life play a critical role in life-course health, yet the molecular phenotypes underlying environmental effects on health are poorly understood. In the Human Early Life Exposome (HELIX) project, a multi-centre cohort of 1301 mother-child pairs, we associate individual exposomes consisting of >100 chemical, outdoor, social and lifestyle exposures assessed in pregnancy and childhood, with multi-omics profiles (methylome, transcriptome, proteins and metabolites) in childhood. We identify 1170 associations, 249 in pregnancy and 921 in childhood, which reveal potential biological responses and sources of exposure. Pregnancy exposures, including maternal smoking, cadmium and molybdenum, are predominantly associated with child DNA methylation changes. In contrast, childhood exposures are associated with features across all omics layers, most frequently the serum metabolome, revealing signatures for diet, toxic chemical compounds, essential trace elements, and weather conditions, among others. Our comprehensive and unique resource of all associations ( https://helixomics.isglobal.org/ ) will serve to guide future investigation into the biological imprints of the early life exposome.
Collapse
|
7
|
Software Application Profile: ShinyDataSHIELD—an R Shiny application to perform federated non-disclosive data analysis in multicohort studies. Int J Epidemiol 2022. [DOI: 10.1093/ije/dyac201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
DataSHIELD is an open-source software infrastructure enabling the analysis of data distributed across multiple databases (federated data) without leaking individuals’ information (non-disclosive). It has applications in many scientific domains, ranging from biosciences to social sciences and including high-throughput genomic studies. R is the language used to interact with (and build) DataSHIELD. This creates difficulties for researchers who do not have experience writing R code or lack the time to learn how to use the DataSHIELD functions. To help new researchers use the DataSHIELD infrastructure and to improve the user-friendliness for experienced researchers, we present ShinyDataSHIELD.
Implementation
ShinyDataSHIELD is a web application with an R backend that serves as a graphical user interface (GUI) to the DataSHIELD infrastructure.
General features
The version of the application presented here includes modules to perform: (i) exploratory analysis through descriptive summary statistics and graphical representations (scatter plots, histograms, heatmaps and boxplots); (ii) statistical modelling (generalized linear fixed and mixed-effects models, survival analysis through Cox regression); (iii) genome-wide association studies (GWAS); and (iv) omic analysis (transcriptomics, epigenomics and multi-omic integration).
Availability
ShinyDataSHIELD is publicly hosted online [https://datashield-demo.obiba.org/], the source code and user guide are deposited on Zenodo DOI 10.5281/zenodo.6500323, freely available to non-commercial users under ‘Commons Clause’ License Condition v1.0. Docker images are also available [https://hub.docker.com/r/brgelab/shiny-data-shield].
Collapse
|
8
|
Mass Spectrometry Identification of Biomarkers in Extracellular Vesicles From Plasmodium vivax Liver Hypnozoite Infections. Mol Cell Proteomics 2022; 21:100406. [PMID: 36030044 PMCID: PMC9520272 DOI: 10.1016/j.mcpro.2022.100406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 08/12/2022] [Accepted: 08/20/2022] [Indexed: 01/18/2023] Open
Abstract
Latent liver stages termed hypnozoites cause relapsing Plasmodium vivax malaria infection and represent a major obstacle in the goal of malaria elimination. Hypnozoites are clinically undetectable, and presently, there are no biomarkers of this persistent parasite reservoir in the human liver. Here, we have identified parasite and human proteins associated with extracellular vesicles (EVs) secreted from in vivo infections exclusively containing hypnozoites. We used P. vivax-infected human liver-chimeric (huHEP) FRG KO mice treated with the schizonticidal experimental drug MMV048 as hypnozoite infection model. Immunofluorescence-based quantification of P. vivax liver forms showed that MMV048 removed schizonts from chimeric mice livers. Proteomic analysis of EVs derived from FRG huHEP mice showed that human EV cargo from infected FRG huHEP mice contain inflammation markers associated with active schizont replication and identified 66 P. vivax proteins. To identify hypnozoite-specific proteins associated with EVs, we mined the proteome data from MMV048-treated mice and performed an analysis involving intragroup and intergroup comparisons across all experimental conditions followed by a peptide compatibility analysis with predicted spectra to warrant robust identification. Only one protein fulfilled this stringent top-down selection, a putative filamin domain-containing protein. This study sets the stage to unveil biological features of human liver infections and identify biomarkers of hypnozoite infection associated with EVs.
Collapse
|
9
|
The early-life exposome modulates the effect of polymorphic inversions on DNA methylation. Commun Biol 2022; 5:455. [PMID: 35550596 PMCID: PMC9098634 DOI: 10.1038/s42003-022-03380-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 04/19/2022] [Indexed: 11/14/2022] Open
Abstract
Polymorphic genomic inversions are chromosomal variants with intrinsic variability that play important roles in evolution, environmental adaptation, and complex traits. We investigated the DNA methylation patterns of three common human inversions, at 8p23.1, 16p11.2, and 17q21.31 in 1,009 blood samples from children from the Human Early Life Exposome (HELIX) project and in 39 prenatal heart tissue samples. We found inversion-state specific methylation patterns within and nearby flanking each inversion region in both datasets. Additionally, numerous inversion-exposure interactions on methylation levels were identified from early-life exposome data comprising 64 exposures. For instance, children homozygous at inv-8p23.1 and higher meat intake were more susceptible to TDH hypermethylation (P = 3.8 × 10−22); being the inversion, exposure, and gene known risk factors for adult obesity. Inv-8p23.1 associated hypermethylation of GATA4 was also detected across numerous exposures. Our data suggests that the pleiotropic influence of inversions during development and lifetime could be substantially mediated by allele-specific methylation patterns which can be modulated by the exposome. Analysis of the relationship between presence of common DNA sequence inversions and DNA methylation patterns suggests a role for environmental exposures (such as food intake) in mediating inversion state-specific methylation patterns.
Collapse
|
10
|
Systematic Collaborative Reanalysis of Genomic Data Improves Diagnostic Yield in Neurologic Rare Diseases. J Mol Diagn 2022; 24:529-542. [PMID: 35569879 DOI: 10.1016/j.jmoldx.2022.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 12/16/2021] [Accepted: 02/03/2022] [Indexed: 11/26/2022] Open
Abstract
Many patients experiencing a rare disease remain undiagnosed even after genomic testing. Reanalysis of existing genomic data has shown to increase diagnostic yield, although there are few systematic and comprehensive reanalysis efforts that enable collaborative interpretation and future reinterpretation. The Undiagnosed Rare Disease Program of Catalonia project collated previously inconclusive good quality genomic data (panels, exomes, and genomes) and standardized phenotypic profiles from 323 families (543 individuals) with a neurologic rare disease. The data were reanalyzed systematically to identify relatedness, runs of homozygosity, consanguinity, single-nucleotide variants, insertions and deletions, and copy number variants. Data were shared and collaboratively interpreted within the consortium through a customized Genome-Phenome Analysis Platform, which also enables future data reinterpretation. Reanalysis of existing genomic data provided a diagnosis for 20.7% of the patients, including 1.8% diagnosed after the generation of additional genomic data to identify a second pathogenic heterozygous variant. Diagnostic rate was significantly higher for family-based exome/genome reanalysis compared with singleton panels. Most new diagnoses were attributable to recent gene-disease associations (50.8%), additional or improved bioinformatic analysis (19.7%), and standardized phenotyping data integrated within the Undiagnosed Rare Disease Program of Catalonia Genome-Phenome Analysis Platform functionalities (18%).
Collapse
|
11
|
teff: estimation of Treatment EFFects on transcriptomic data using causal random forest. Bioinformatics 2022; 38:3124-3125. [DOI: 10.1093/bioinformatics/btac269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 04/07/2022] [Accepted: 04/11/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Causal inference on high dimensional feature data can be used to find a profile of patients who will benefit the most from treatment rather than no treatment. However, there is a need for usable implementations for transcriptomic data. We developed teff that applies random causal forest on gene expression data to target individuals with high expected treatment effects.
Results
We extracted a profile of high benefit of treating psoriasis with brodalumab and observed that it was associated with higher T cell abundance in non-lesional skin at baseline and a lower response for etanercept in an independent study. Individual patient targeting with causal inference profiling can inform patients on choosing between treatments before the intervention begins.
Availability and Implementation
teff is an R package available at https://teff-package.github.io
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
Identification of autosomal cis expression quantitative trait methylation (cis eQTMs) in children's blood. eLife 2022; 11:65310. [PMID: 35302492 PMCID: PMC8933004 DOI: 10.7554/elife.65310] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 02/11/2022] [Indexed: 12/12/2022] Open
Abstract
Background The identification of expression quantitative trait methylation (eQTMs), defined as associations between DNA methylation levels and gene expression, might help the biological interpretation of epigenome-wide association studies (EWAS). We aimed to identify autosomal cis eQTMs in children's blood, using data from 832 children of the Human Early Life Exposome (HELIX) project. Methods Blood DNA methylation and gene expression were measured with the Illumina 450K and the Affymetrix HTA v2 arrays, respectively. The relationship between methylation levels and expression of nearby genes (1 Mb window centered at the transcription start site, TSS) was assessed by fitting 13.6 M linear regressions adjusting for sex, age, cohort, and blood cell composition. Results We identified 39,749 blood autosomal cis eQTMs, representing 21,966 unique CpGs (eCpGs, 5.7% of total CpGs) and 8,886 unique transcript clusters (eGenes, 15.3% of total transcript clusters, equivalent to genes). In 87.9% of these cis eQTMs, the eCpG was located at <250 kb from eGene's TSS; and 58.8% of all eQTMs showed an inverse relationship between the methylation and expression levels. Only around half of the autosomal cis-eQTMs eGenes could be captured through annotation of the eCpG to the closest gene. eCpGs had less measurement error and were enriched for active blood regulatory regions and for CpGs reported to be associated with environmental exposures or phenotypic traits. In 40.4% of the eQTMs, the CpG and the eGene were both associated with at least one genetic variant. The overlap of autosomal cis eQTMs in children's blood with those described in adults was small (13.8%), and age-shared cis eQTMs tended to be proximal to the TSS and enriched for genetic variants. Conclusions This catalogue of autosomal cis eQTMs in children's blood can help the biological interpretation of EWAS findings and is publicly available at https://helixomics.isglobal.org/ and at Dryad (doi:10.5061/dryad.fxpnvx0t0). Funding The study has received funding from the European Community's Seventh Framework Programme (FP7/2007-206) under grant agreement no 308333 (HELIX project); the H2020-EU.3.1.2. - Preventing Disease Programme under grant agreement no 874583 (ATHLETE project); from the European Union's Horizon 2020 research and innovation programme under grant agreement no 733206 (LIFECYCLE project), and from the European Joint Programming Initiative "A Healthy Diet for a Healthy Life" (JPI HDHL and Instituto de Salud Carlos III) under the grant agreement no AC18/00006 (NutriPROGRAM project). The genotyping was supported by the projects PI17/01225 and PI17/01935, funded by the Instituto de Salud Carlos III and co-funded by European Union (ERDF, "A way to make Europe") and the Centro Nacional de Genotipado-CEGEN (PRB2-ISCIII). BiB received core infrastructure funding from the Wellcome Trust (WT101597MA) and a joint grant from the UK Medical Research Council (MRC) and Economic and Social Science Research Council (ESRC) (MR/N024397/1). INMA data collections were supported by grants from the Instituto de Salud Carlos III, CIBERESP, and the Generalitat de Catalunya-CIRIT. KANC was funded by the grant of the Lithuanian Agency for Science Innovation and Technology (6-04-2014_31V-66). The Norwegian Mother, Father and Child Cohort Study is supported by the Norwegian Ministry of Health and Care Services and the Ministry of Education and Research. The Rhea project was financially supported by European projects (EU FP6-2003-Food-3-NewGeneris, EU FP6. STREP Hiwate, EU FP7 ENV.2007.1.2.2.2. Project No 211250 Escape, EU FP7-2008-ENV-1.2.1.4 Envirogenomarkers, EU FP7-HEALTH-2009- single stage CHICOS, EU FP7 ENV.2008.1.2.1.6. Proposal No 226285 ENRIECO, EU- FP7- HEALTH-2012 Proposal No 308333 HELIX), and the Greek Ministry of Health (Program of Prevention of obesity and neurodevelopmental disorders in preschool children, in Heraklion district, Crete, Greece: 2011-2014; "Rhea Plus": Primary Prevention Program of Environmental Risk Factors for Reproductive Health, and Child Health: 2012-15). We acknowledge support from the Spanish Ministry of Science and Innovation through the "Centro de Excelencia Severo Ochoa 2019-2023" Program (CEX2018-000806-S), and support from the Generalitat de Catalunya through the CERCA Program. MV-U and CR-A were supported by a FI fellowship from the Catalan Government (FI-DGR 2015 and #016FI_B 00272). MC received funding from Instituto Carlos III (Ministry of Economy and Competitiveness) (CD12/00563 and MS16/00128).
Collapse
|
13
|
Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:6535682. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
14
|
Meta-analysis of epigenome-wide association studies in newborns and children show widespread sex differences in blood DNA methylation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2022; 789:108415. [PMID: 35690418 PMCID: PMC9623595 DOI: 10.1016/j.mrrev.2022.108415] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 02/27/2022] [Accepted: 03/08/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Among children, sex-specific differences in disease prevalence, age of onset, and susceptibility have been observed in health conditions including asthma, immune response, metabolic health, some pediatric and adult cancers, and psychiatric disorders. Epigenetic modifications such as DNA methylation may play a role in the sexual differences observed in diseases and other physiological traits. METHODS We performed a meta-analysis of the association of sex and cord blood DNA methylation at over 450,000 CpG sites in 8438 newborns from 17 cohorts participating in the Pregnancy And Childhood Epigenetics (PACE) Consortium. We also examined associations of child sex with DNA methylation in older children ages 5.5-10 years from 8 cohorts (n = 4268). RESULTS In newborn blood, sex was associated at Bonferroni level significance with differences in DNA methylation at 46,979 autosomal CpG sites (p < 1.3 × 10-7) after adjusting for white blood cell proportions and batch. Most of those sites had lower methylation levels in males than in females. Of the differentially methylated CpG sites identified in newborn blood, 68% (31,727) met look-up level significance (p < 1.1 × 10-6) in older children and had methylation differences in the same direction. CONCLUSIONS This is a large-scale meta-analysis examining sex differences in DNA methylation in newborns and older children. Expanding upon previous studies, we replicated previous findings and identified additional autosomal sites with sex-specific differences in DNA methylation. Differentially methylated sites were enriched in genes involved in cancer, psychiatric disorders, and cardiovascular phenotypes.
Collapse
|
15
|
The early-life exposome and epigenetic age acceleration in children. ENVIRONMENT INTERNATIONAL 2021; 155:106683. [PMID: 34144479 DOI: 10.1016/j.envint.2021.106683] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 06/01/2021] [Accepted: 06/01/2021] [Indexed: 06/12/2023]
Abstract
The early-life exposome influences future health and accelerated biological aging has been proposed as one of the underlying biological mechanisms. We investigated the association between more than 100 exposures assessed during pregnancy and in childhood (including indoor and outdoor air pollutants, built environment, green environments, tobacco smoking, lifestyle exposures, and biomarkers of chemical pollutants), and epigenetic age acceleration in 1,173 children aged 7 years old from the Human Early-Life Exposome project. Age acceleration was calculated based on Horvath's Skin and Blood clock using child blood DNA methylation measured by Infinium HumanMethylation450 BeadChips. We performed an exposure-wide association study between prenatal and childhood exposome and age acceleration. Maternal tobacco smoking during pregnancy was nominally associated with increased age acceleration. For childhood exposures, indoor particulate matter absorbance (PMabs) and parental smoking were nominally associated with an increase in age acceleration. Exposure to the organic pesticide dimethyl dithiophosphate and the persistent pollutant polychlorinated biphenyl-138 (inversely associated with child body mass index) were protective for age acceleration. None of the associations remained significant after multiple-testing correction. Pregnancy and childhood exposure to tobacco smoke and childhood exposure to indoor PMabs may accelerate epigenetic aging from an early age.
Collapse
|
16
|
methylclock: a Bioconductor package to estimate DNA methylation age. Bioinformatics 2021; 37:1759-1760. [PMID: 32960939 DOI: 10.1093/bioinformatics/btaa825] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 07/23/2020] [Accepted: 09/08/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Ageing is a biological and psychosocial process related to diseases and mortality. It correlates with changes in DNA methylation (DNAm) in all human tissues. Therefore, epigenetic markers can be used to estimate biological age using DNAm profiling across tissues. RESULTS We developed a Bioconductor package that allows computation of several existing DNAm adult/childhood and gestational age clocks. Functions to visualize the DNAm age prediction versus chronological age and the correlation between DNAm clocks are also available as well as other features, such as missing data imputation of cell types' estimates, that are required for DNAm age clocks. AVAILABILITY AND IMPLEMENTATION https://github.com/isglobal-brge/methylclock. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Abstract
BACKGROUND Multiple omics technologies are increasingly applied to detect early, subtle molecular responses to environmental stressors for future disease risk prevention. However, there is an urgent need for further evaluation of stability and variability of omics profiles in healthy individuals, especially during childhood. METHODS We aimed to estimate intra-, inter-individual and cohort variability of multi-omics profiles (blood DNA methylation, gene expression, miRNA, proteins and serum and urine metabolites) measured 6 months apart in 156 healthy children from five European countries. We further performed a multi-omics network analysis to establish clusters of co-varying omics features and assessed the contribution of key variables (including biological traits and sample collection parameters) to omics variability. RESULTS All omics displayed a large range of intra- and inter-individual variability depending on each omics feature, although all presented a highest median intra-individual variability. DNA methylation was the most stable profile (median 37.6% inter-individual variability) while gene expression was the least stable (6.6%). Among the least stable features, we identified 1% cross-omics co-variation between CpGs and metabolites (e.g. glucose and CpGs related to obesity and type 2 diabetes). Explanatory variables, including age and body mass index (BMI), explained up to 9% of serum metabolite variability. CONCLUSIONS Methylation and targeted serum metabolomics are the most reliable omics to implement in single time-point measurements in large cross-sectional studies. In the case of metabolomics, sample collection and individual traits (e.g. BMI) are important parameters to control for improved comparability, at the study design or analysis stage. This study will be valuable for the design and interpretation of epidemiological studies that aim to link omics signatures to disease, environmental exposures, or both.
Collapse
|
18
|
Extreme Downregulation of Chromosome Y and Cancer Risk in Men. J Natl Cancer Inst 2021; 112:913-920. [PMID: 31945786 DOI: 10.1093/jnci/djz232] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 10/31/2019] [Accepted: 12/11/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Understanding the biological differences between sexes in cancer is essential for personalized treatment and prevention. We hypothesized that the extreme downregulation of chromosome Y gene expression (EDY) is a signature of cancer risk in men and the functional mediator of the reported association between the mosaic loss of chromosome Y (LOY) and cancer. METHODS We advanced a method to measure EDY from transcriptomic data. We studied EDY across 47 nondiseased tissues from the Genotype Tissue-Expression Project (n = 371) and its association with cancer status across 12 cancer studies from The Cancer Genome Atlas (n = 1774) and seven other studies (n = 7562). Associations of EDY with cancer status and presence of loss-off function mutations in chromosome X were tested with logistic regression models, and a Fisher's test was used to assess genome-wide association of EDY with the proportion of copy number gains. All statistical tests were two-sided. RESULTS EDY was likely to occur in multiple nondiseased tissues (P < .001) and was statistically significantly associated with the EGFR tyrosine kinase inhibitor resistance pathway (false discovery rate = 0.028). EDY strongly associated with cancer risk in men (odds ratio [OR] = 3.66, 95% confidence interval [CI] = 1.58 to 8.46, P = .002), adjusted by LOY and age, and its variability was largely explained by several genes of the nonrecombinant region whose chromosome X homologs showed loss-of-function mutations that co-occurred with EDY during cancer (OR = 2.82, 95% CI = 1.32 to 6.01, P = .007). EDY associated with a high proportion of EGFR amplifications (OR = 5.64, 95% CI = 3.70 to 8.59, false discovery rate < 0.001) and EGFR overexpression along with SRY hypomethylation and nonrecombinant region hypermethylation, indicating alternative causes of EDY in cancer other than LOY. EDY associations were independently validated for different cancers and exposure to smoking, and its status was accurately predicted from individual methylation patterns. CONCLUSIONS EDY is a male-specific signature of cancer susceptibility that supports the escape from X-inactivation tumor suppressor hypothesis for genes that protect women compared with men from cancer risk.
Collapse
|
19
|
Publisher Correction: MLIP genotype as a predictor of pharmacological response in primary open-angle glaucoma and ocular hypertension. Sci Rep 2021; 11:8237. [PMID: 33837244 PMCID: PMC8035325 DOI: 10.1038/s41598-021-87653-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
20
|
Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD. PLoS Comput Biol 2021; 17:e1008880. [PMID: 33784300 PMCID: PMC8034722 DOI: 10.1371/journal.pcbi.1008880] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 04/09/2021] [Accepted: 03/17/2021] [Indexed: 01/31/2023] Open
Abstract
Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).
Collapse
|
21
|
MLIP genotype as a predictor of pharmacological response in primary open-angle glaucoma and ocular hypertension. Sci Rep 2021; 11:1583. [PMID: 33452295 PMCID: PMC7810753 DOI: 10.1038/s41598-020-80954-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 12/24/2020] [Indexed: 11/20/2022] Open
Abstract
Predicting the therapeutic response to ocular hypotensive drugs is crucial for the clinical treatment and management of glaucoma. Our aim was to identify a possible genetic contribution to the response to current pharmacological treatments of choice in a white Mediterranean population with primary open-angle glaucoma (POAG) or ocular hypertension (OH). We conducted a prospective, controlled, randomized, partial crossover study that included 151 patients of both genders, aged 18 years and older, diagnosed with and requiring pharmacological treatment for POAG or OH in one or both eyes. We sought to identify copy number variants (CNVs) associated with differences in pharmacological response, using a DNA pooling strategy of carefully phenotyped treatment responders and non-responders, treated for a minimum of 6 weeks with a beta-blocker (timolol maleate) and/or prostaglandin analog (latanoprost). Diurnal intraocular pressure reduction and comparative genome wide CNVs were analyzed. Our finding that copy number alleles of an intronic portion of the MLIP gene is a predictor of pharmacological response to beta blockers and prostaglandin analogs could be used as a biomarker to guide first-tier POAG and OH treatment. Our finding improves understanding of the genetic factors modulating pharmacological response in POAG and OH, and represents an important contribution to the establishment of a personalized approach to the treatment of glaucoma.
Collapse
|
22
|
Urinary metabolite quantitative trait loci in children and their interaction with dietary factors. Hum Mol Genet 2020; 29:3830-3844. [PMID: 33283231 DOI: 10.1093/hmg/ddaa257] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/26/2020] [Accepted: 11/30/2020] [Indexed: 11/14/2022] Open
Abstract
Human metabolism is influenced by genetic and environmental factors. Previous studies have identified over 23 loci associated with more than 26 urine metabolites levels in adults, which are known as urinary metabolite quantitative trait loci (metabQTLs). The aim of the present study is the identification for the first time of urinary metabQTLs in children and their interaction with dietary patterns. Association between genome-wide genotyping data and 44 urine metabolite levels measured by proton nuclear magnetic resonance spectroscopy was tested in 996 children from the Human Early Life Exposome project. Twelve statistically significant urine metabQTLs were identified, involving 11 unique loci and 10 different metabolites. Comparison with previous findings in adults revealed that six metabQTLs were already known, and one had been described in serum and three were involved the same locus as other reported metabQTLs but had different urinary metabolites. The remaining two metabQTLs represent novel urine metabolite-locus associations, which are reported for the first time in this study [single nucleotide polymorphism (SNP) rs12575496 for taurine, and the missense SNP rs2274870 for 3-hydroxyisobutyrate]. Moreover, it was found that urinary taurine levels were affected by the combined action of genetic variation and dietary patterns of meat intake as well as by the interaction of this SNP with beverage intake dietary patterns. Overall, we identified 12 urinary metabQTLs in children, including two novel associations. While a substantial part of the identified loci affected urinary metabolite levels both in children and in adults, the metabQTL for taurine seemed to be specific to children and interacted with dietary patterns.
Collapse
|
23
|
Female-specific risk of Alzheimer's disease is associated with tau phosphorylation processes: A transcriptome-wide interaction analysis. Neurobiol Aging 2020; 96:104-108. [PMID: 32977080 DOI: 10.1016/j.neurobiolaging.2020.08.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 08/25/2020] [Accepted: 08/25/2020] [Indexed: 01/09/2023]
Abstract
The levels of tau phosphorylation differ between sexes in Alzheimer's disease (AD). Transcriptome-wide associations of sex by disease interaction could indicate whether specific genes underlie sex differences in tau pathology; however, no such study has been reported yet. We report the first analysis of the effect of the interaction between disease status and sex on differential gene expression, meta-analyzing transcriptomic data from the 3 largest publicly available case-control studies (N = 785) in the brain to date. A total of 128 genes, significantly associated with sex-AD interactions, were enriched in phosphoproteins (false discovery rate (FDR) = 0.001). High and consistent associations were found for the overexpressions of NCL (FDR = 0.002), whose phosphorylated protein generates an epitope against neurofibrillary tangles and KIF2A (FDR = 0.005), a microtubule-associated motor protein gene. Transcriptome-wide interaction analyses suggest sex-modulated tau phosphorylation, at sites like Thr231, Ser199, or Ser202 that could increase the risk of women to AD and indicate sex-specific strategies for intervention and prevention.
Collapse
|
24
|
MADloy: robust detection of mosaic loss of chromosome Y from genotype-array-intensity data. BMC Bioinformatics 2020; 21:533. [PMID: 33225898 PMCID: PMC7682048 DOI: 10.1186/s12859-020-03768-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 09/20/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Accurate protocols and methods to robustly detect the mosaic loss of chromosome Y (mLOY) are needed given its reported role in cancer, several age-related disorders and overall male mortality. Intensity SNP-array data have been used to infer mLOY status and to determine its prominent role in male disease. However, discrepancies of reported findings can be due to the uncertainty and variability of the methods used for mLOY detection and to the differences in the tissue-matrix used. RESULTS We created a publicly available software tool called MADloy (Mosaic Alteration Detection for LOY) that incorporates existing methods and includes a new robust approach, allowing efficient calling in large studies and comparisons between methods. MADloy optimizes mLOY calling by correctly modeling the underlying reference population with no-mLOY status and incorporating B-deviation information. We observed improvements in the calling accuracy to previous methods, using experimentally validated samples, and an increment in the statistical power to detect associations with disease and mortality, using simulation studies and real dataset analyses. To understand discrepancies in mLOY detection across different tissues, we applied MADloy to detect the increment of mLOY cellularity in blood on 18 individuals after 3 years and to confirm that its detection in saliva was sub-optimal (41%). We additionally applied MADloy to detect the down-regulation genes in the chromosome Y in kidney and bladder tumors with mLOY, and to perform pathway analyses for the detection of mLOY in blood. CONCLUSIONS MADloy is a new software tool implemented in R for the easy and robust calling of mLOY status across different tissues aimed to facilitate its study in large epidemiological studies.
Collapse
|
25
|
Identifying chromosomal subpopulations based on their recombination histories advances the study of the genetic basis of phenotypic traits. Genome Res 2020; 30:1802-1814. [PMID: 33203765 PMCID: PMC7706724 DOI: 10.1101/gr.258301.119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 10/22/2020] [Indexed: 02/06/2023]
Abstract
Recombination is a main source of genetic variability. However, the potential role of the variation generated by recombination in phenotypic traits, including diseases, remains unexplored because there is currently no method to infer chromosomal subpopulations based on recombination pattern differences. We developed recombClust, a method that uses SNP-phased data to detect differences in historic recombination in a chromosome population. We validated our method by performing simulations and by using real data to accurately predict the alleles of well-known recombination modifiers, including common inversions in Drosophila melanogaster and human, and the chromosomes under selective pressure at the lactase locus in humans. We then applied recombClust to the complex human 1q21.1 region, where nonallelic homologous recombination produces deleterious phenotypes. We discovered and validated the presence of two different recombination histories in these regions that significantly associated with the differential expression of ANKRD35 in whole blood and that were in high linkage with variants previously associated with hypertension. By detecting differences in historic recombination, our method opens a way to assess the influence of recombination variation in phenotypic traits.
Collapse
|
26
|
In utero and childhood exposure to tobacco smoke and multi-layer molecular signatures in children. BMC Med 2020; 18:243. [PMID: 32811491 PMCID: PMC7437049 DOI: 10.1186/s12916-020-01686-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/29/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND The adverse health effects of early life exposure to tobacco smoking have been widely reported. In spite of this, the underlying molecular mechanisms of in utero and postnatal exposure to tobacco smoke are only partially understood. Here, we aimed to identify multi-layer molecular signatures associated with exposure to tobacco smoke in these two exposure windows. METHODS We investigated the associations of maternal smoking during pregnancy and childhood secondhand smoke (SHS) exposure with molecular features measured in 1203 European children (mean age 8.1 years) from the Human Early Life Exposome (HELIX) project. Molecular features, covering 4 layers, included blood DNA methylation and gene and miRNA transcription, plasma proteins, and sera and urinary metabolites. RESULTS Maternal smoking during pregnancy was associated with DNA methylation changes at 18 loci in child blood. DNA methylation at 5 of these loci was related to expression of the nearby genes. However, the expression of these genes themselves was only weakly associated with maternal smoking. Conversely, childhood SHS was not associated with blood DNA methylation or transcription patterns, but with reduced levels of several serum metabolites and with increased plasma PAI1 (plasminogen activator inhibitor-1), a protein that inhibits fibrinolysis. Some of the in utero and childhood smoking-related molecular marks showed dose-response trends, with stronger effects with higher dose or longer duration of the exposure. CONCLUSION In this first study covering multi-layer molecular features, pregnancy and childhood exposure to tobacco smoke were associated with distinct molecular phenotypes in children. The persistent and dose-dependent changes in the methylome make CpGs good candidates to develop biomarkers of past exposure. Moreover, compared to methylation, the weak association of maternal smoking in pregnancy with gene expression suggests different reversal rates and a methylation-based memory to past exposures. Finally, certain metabolites and protein markers evidenced potential early biological effects of postnatal SHS, such as fibrinolysis.
Collapse
|
27
|
Polymorphic Inversions Underlie the Shared Genetic Susceptibility of Obesity-Related Diseases. Am J Hum Genet 2020; 106:846-858. [PMID: 32470372 DOI: 10.1016/j.ajhg.2020.04.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/28/2020] [Indexed: 11/25/2022] Open
Abstract
The burden of several common diseases including obesity, diabetes, hypertension, asthma, and depression is increasing in most world populations. However, the mechanisms underlying the numerous epidemiological and genetic correlations among these disorders remain largely unknown. We investigated whether common polymorphic inversions underlie the shared genetic influence of these disorders. We performed an inversion association analysis including 21 inversions and 25 obesity-related traits on a total of 408,898 Europeans and validated the results in 67,299 independent individuals. Seven inversions were associated with multiple diseases while inversions at 8p23.1, 16p11.2, and 11q13.2 were strongly associated with the co-occurrence of obesity with other common diseases. Transcriptome analysis across numerous tissues revealed strong candidate genes for obesity-related traits. Analyses in human pancreatic islets indicated the potential mechanism of inversions in the susceptibility of diabetes by disrupting the cis-regulatory effect of SNPs from their target genes. Our data underscore the role of inversions as major genetic contributors to the joint susceptibility to common complex diseases.
Collapse
|
28
|
Independent Multiple Factor Association Analysis for Multiblock Data in Imaging Genetics. Neuroinformatics 2020; 17:583-592. [PMID: 30903541 DOI: 10.1007/s12021-019-09416-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Multivariate methods have the potential to better capture complex relationships that may exist between different biological levels. Multiple Factor Analysis (MFA) is one of the most popular methods to obtain factor scores and measures of discrepancy between data sets. However, singular value decomposition in MFA is based on PCA, which is adequate only if the data is normally distributed, linear or stationary. In addition, including strongly correlated variables can overemphasize the contribution of the estimated components. In this work, we introduced a novel method referred as Independent Multifactorial Analysis (ICA-MFA) to derive relevant features from multiscale data. This method is an extended implementation of MFA, where the component value decomposition is based on Independent Component Analysis. In addition, ICA-MFA incorporates a predictive step based on an Independent Component Regression. We evaluated and compared the performance of ICA-MFA with both, the MFA method and traditional univariate analyses, in a simulation study. We showed how ICA-MFA explained up to 10-fold more variance than MFA and univariate methods. We applied the proposed algorithm in a study of 4057 individuals belonging to the population-based Rotterdam Study with available genetic and neuroimaging data, as well as information about executive cognitive functioning. Specifically, we used ICA-MFA to detect relevant genetic features related to structural brain regions, which in turn were involved, in the mechanisms of executive cognitive function. The proposed strategy makes it possible to determine the degree to which the whole set of genetic and/or neuroimaging markers contribute to the variability of the symptomatology jointly, rather than individually. While univariate results and MFA combinations only explained a limited proportion of variance (less than 2%), our method increased the explained variance (10%) and allowed the identification of significant components that maximize the variance explained in the model. The potential application of the ICA-MFA algorithm constitutes an important aspect of integrating multivariate multiscale data, specifically in the field of Neurogenetics.
Collapse
|
29
|
Dose and time effects of solar-simulated ultraviolet radiation on the in vivo human skin transcriptome. Br J Dermatol 2019; 182:1458-1468. [PMID: 31529490 PMCID: PMC7318624 DOI: 10.1111/bjd.18527] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2019] [Indexed: 12/18/2022]
Abstract
Background Terrestrial ultraviolet (UV) radiation causes erythema, oxidative stress, DNA mutations and skin cancer. Skin can adapt to these adverse effects by DNA repair, apoptosis, keratinization and tanning. Objectives To investigate the transcriptional response to fluorescent solar‐simulated radiation (FSSR) in sun‐sensitive human skin in vivo. Methods Seven healthy male volunteers were exposed to 0, 3 and 6 standard erythemal doses (SED). Skin biopsies were taken at 6 h and 24 h after exposure. Gene and microRNA expression were quantified with next generation sequencing. A set of candidate genes was validated by quantitative polymerase chain reaction (qPCR); and wavelength dependence was examined in other volunteers through microarrays. Results The number of differentially expressed genes increased with FSSR dose and decreased between 6 and 24 h. Six hours after 6 SED, 4071 genes were differentially expressed, but only 16 genes were affected at 24 h after 3 SED. Genes for apoptosis and keratinization were prominent at 6 h, whereas inflammation and immunoregulation genes were predominant at 24 h. Validation by qPCR confirmed the altered expression of nine genes detected under all conditions; genes related to DNA repair and apoptosis; immunity and inflammation; pigmentation; and vitamin D synthesis. In general, candidate genes also responded to UVA1 (340–400 nm) and/or UVB (300 nm), but with variations in wavelength dependence and peak expression time. Only four microRNAs were differentially expressed by FSSR. Conclusions The UV radiation doses of this acute study are readily achieved daily during holidays in the sun, suggesting that the skin transcriptional profile of ‘typical’ holiday makers is markedly deregulated. What's already known about this topic? The skin's transcriptional profile underpins its adverse (i.e. inflammation) and adaptive molecular, cellular and clinical responses (i.e. tanning, hyperkeratosis) to solar ultraviolet radiation. Few studies have assessed microRNA and gene expression in vivo in humans, and there is a lack of information on dose, time and waveband effects.
What does this study add? Acute doses of fluorescent solar‐simulated radiation (FSSR), of similar magnitude to those received daily in holiday situations, markedly altered the skin's transcriptional profiles. The number of differentially expressed genes was FSSR‐dose‐dependent, reached a peak at 6 h and returned to baseline at 24 h. The initial transcriptional response involved apoptosis and keratinization, followed by inflammation and immune modulation. In these conditions, microRNA expression was less affected than gene expression.
Linked Comment:Hart. Br J Dermatol 2020; 182:1328–1329. Plain language summary available online Respond to this article
Collapse
|
30
|
Common polymorphic inversions at 17q21.31 and 8p23.1 associate with cancer prognosis. Hum Genomics 2019; 13:57. [PMID: 31753042 PMCID: PMC6873427 DOI: 10.1186/s40246-019-0242-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Chromosomal inversions are structural genetic variants where a chromosome segment changes its orientation. While sporadic de novo inversions are known genetic risk factors for cancer susceptibility, it is unknown if common polymorphic inversions are also associated with the prognosis of common tumors, as they have been linked to other complex diseases. We studied the association of two well-characterized human inversions at 17q21.31 and 8p23.1 with the prognosis of lung, liver, breast, colorectal, and stomach cancers. RESULTS Using data from The Cancer Genome Atlas (TCGA), we observed that inv8p23.1 was associated with overall survival in breast cancer and that inv17q21.31 was associated with overall survival in stomach cancer. In the meta-analysis of two independent studies, inv17q21.31 heterozygosity was significantly associated with colorectal disease-free survival. We found that the association was mediated by the de-methylation of cg08283464 and cg03999934, also linked to lower disease-free survival. CONCLUSIONS Our results suggest that chromosomal inversions are important genetic factors of tumor prognosis, likely affecting changes in methylation patterns.
Collapse
|
31
|
scoreInvHap: Inversion genotyping for genome-wide association studies. PLoS Genet 2019; 15:e1008203. [PMID: 31269027 PMCID: PMC6608898 DOI: 10.1371/journal.pgen.1008203] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/17/2019] [Indexed: 02/02/2023] Open
Abstract
Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor. Chromosomal inversions are structural variants consisting on an orientation change of a chromosome segment. Inversions have been linked to some phenotypic differences between individuals and to genetic divergence. However, their overall contribution to complex diseases is largely underdetermined as there are no high-throughput methods to call inversion-genotypes in large cohort studies. Here, we propose a new method, scoreInvHap, to call individual inversion genotypes from their haplotype similarity. We show that scoreInvHap has a high performance when analyzing heterogeneous sources of SNP data. Our current implementation contains 20 human inversions that can be readily genotyped in existing GWAS datasets. We exemplify the utility of scoreInvHap by running the first-genome wide association of experimentally validated inversions and a multi-centric inversion association study. All in all, scoreInvHap can substantially contribute to increase our knowledge of the role of chromosomal inversions in complex diseases by re-analyzing data from existing genetic association studies.
Collapse
|
32
|
Abstract
OBJECTIVE ADHD consists of a count of symptoms that often presents heterogeneity due to overdispersion and excess of zeros. Statistical inference is usually based on a dichotomous outcome that is underpowered. The main goal of this study was to determine a suited probability distribution to analyze ADHD symptoms in Imaging Genetic studies. METHOD We used two independent population samples of children to evaluate the consistency of the standard probability distributions based on count data for describing ADHD symptoms. RESULTS We showed that the zero-inflated negative binomial (ZINB) distribution provided the best power for modeling ADHD symptoms. ZINB reveals a genetic variant, rs273342 (Microtubule-Associated Protein [MAPRE2]), associated with ADHD ( p value = 2.73E-05). This variant was also associated with perivascular volumes (Virchow-Robin spaces; p values < 1E-03). No associations were found when using dichotomous definition. CONCLUSION We suggest that an appropriate modeling of ADHD symptoms increases statistical power to establish significant risk factors.
Collapse
|
33
|
When pitch adds to volume: coregulation of transcript diversity predicts gene function. BMC Genomics 2018; 19:926. [PMID: 30545302 PMCID: PMC6293560 DOI: 10.1186/s12864-018-5263-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022] Open
Abstract
Background Genes corregulate their overall transcript volumes to perform their physiological functions. However, it is unknown if they additionally coregulate their transcript diversities. We studied the reliability, consistency and functional associations of co-splicing correlations of genes of interest, across two independent studies, multiple tissues and two statistical methods. We thoroughly investigated the reproducibility of co-splicing correlations of APP, the candidate gene of Azheimer’s disease (AD). We then studied how co-splicing correlations in different tissues contributed to predict functional interactions of three other genes and finally computed co-splicing frequency for 17 thousand genes across 52 human tissues. Results We replicated co-splicing correlations between APP and 5 AD-related genes and reproduced expected enrichment of APP co-splicing in synaptic vesicle cycle and proteosome pathways. We observed novel associations for tissue vulnerability to disease with enrichment in APP co-splicing, co-expression and epistasis in AD. APP co-splicing was the strongest predictor and replicated between studies. We confirmed known gene interactions of PRPF8 and GRIA1 in testis and brain cortex, and observed a novel interaction of FGFR2, in breast and prostate, modulated by cancer risk-variants. We produced a co-splicing map across 52 human tissues to help predict the function of over 17 thousand genes. Conclusions We show that coregulation of transcript diversities provides novel biological insights in gene physiology and helps to interpret GWAS results. Co-splicing correlations are reliable and frequent and should be further pursued to help predict gene function. Our results additionally support current AD interventions aiming at the ubiquitin proteosome pathway but unveil the need to consider transcript diversity in addition to volume to assess treatment response and susceptibility to the disease. Electronic supplementary material The online version of this article (10.1186/s12864-018-5263-z) contains supplementary material, which is available to authorized users.
Collapse
|
34
|
A systemic approach to identify signaling pathways activated during short-term exposure to traffic-related urban air pollution from human blood. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018; 25:29572-29583. [PMID: 30141164 DOI: 10.1007/s11356-018-3009-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 08/17/2018] [Indexed: 06/08/2023]
Abstract
The molecular mechanisms that promote pathologic alterations in human physiology mediated by short-term exposure to traffic pollutants remains not well understood. This work was to develop mechanistic networks to determine which specific pathways are activated by real-world exposures of traffic-related air pollution (TRAP) during rest and moderate physical activity (PA). A controlled crossover study to compare whole blood gene expression pre and post short-term exposure to high and low of TRAP was performed together with systems biology analysis. Twenty-eight healthy volunteers aged between 21 and 53 years were recruited. These subjects were exposed during 2 h to different pollution levels (high and low TRAP levels), while either cycling or resting. Global transcriptome profile of each condition was performed from human whole blood samples. Microarrays analysis was performed to obtain differential expressed genes (DEG) to be used as initial input for GeneMANIA software to obtain protein-protein (PPI) networks. Two networks were found reflecting high or low TRAP levels, which shared only 5.6 and 15.5% of its nodes, suggesting specific cell signaling pathways being activated in each environmental condition. However, gene ontology analysis of each PPI network suggests that each level of TRAP regulate common members of NF-κB signaling pathway. Our work provides the first approach describing mechanistic networks to understand TRAP effects on a system level.
Collapse
|
35
|
Sparse multiple factor analysis to integrate genetic data, neuroimaging features, and attention-deficit/hyperactivity disorder domains. Int J Methods Psychiatr Res 2018; 27:e1738. [PMID: 30105890 PMCID: PMC6877273 DOI: 10.1002/mpr.1738] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 05/17/2018] [Accepted: 06/26/2018] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES We proposed the application of a multivariate cross-sectional framework based on a combination of a variable selection method and a multiple factor analysis (MFA) in order to identify complex meaningful biological signals related to attention-deficit/hyperactivity disorder (ADHD) symptoms and hyperactivity/inattention domains. METHODS The study included 135 children from the general population with genomic and neuroimaging data. ADHD symptoms were assessed using a questionnaire based on ADHD-DSM-IV criteria. In all analyses, the raw sum scores of the hyperactivity and inattention domains and total ADHD were used. The analytical framework comprised two steps. First, zero-inflated negative binomial linear model via penalized maximum likelihood (LASSO-ZINB) was performed. Second, the most predictive features obtained with LASSO-ZINB were used as input for the MFA. RESULTS We observed significant relationships between ADHD symptoms and hyperactivity and inattention domains with white matter, gray matter regions, and cerebellum, as well as with loci within chromosome 1. CONCLUSIONS Multivariate methods can be used to advance the neurobiological characterization of complex diseases, improving the statistical power with respect to univariate methods, allowing the identification of meaningful biological signals in Imaging Genetic studies.
Collapse
|
36
|
Strategies for integrated analysis in imaging genetics studies. Neurosci Biobehav Rev 2018; 93:57-70. [PMID: 29944960 DOI: 10.1016/j.neubiorev.2018.06.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/30/2018] [Accepted: 06/15/2018] [Indexed: 02/06/2023]
Abstract
Imaging Genetics (IG) integrates neuroimaging and genomic data from the same individual, deepening our knowledge of the biological mechanisms behind neurodevelopmental domains and neurological disorders. Although the literature on IG has exponentially grown over the past years, the majority of studies have mainly analyzed associations between candidate brain regions and individual genetic variants. However, this strategy is not designed to deal with the complexity of neurobiological mechanisms underlying behavioral and neurodevelopmental domains. Moreover, larger sample sizes and increased multidimensionality of this type of data represents a challenge for standardizing modeling procedures in IG research. This review provides a systematic update of the methods and strategies currently used in IG studies, and serves as an analytical framework for researchers working in this field. To complement the functionalities of the Neuroconductor framework, we also describe existing R packages that implement these methodologies. In addition, we present an overview of how these methodological approaches are applied in integrating neuroimaging and genetic data.
Collapse
|
37
|
psygenet2r: a R/Bioconductor package for the analysis of psychiatric disease genes. Bioinformatics 2017; 33:4004-4006. [PMID: 28961763 PMCID: PMC5860088 DOI: 10.1093/bioinformatics/btx506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 08/04/2017] [Accepted: 08/08/2017] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Psychiatric disorders have a great impact on morbidity and mortality. Genotype-phenotype resources for psychiatric diseases are key to enable the translation of research findings to a better care of patients. PsyGeNET is a knowledge resource on psychiatric diseases and their genes, developed by text mining and curated by domain experts. RESULTS We present psygenet2r, an R package that contains a variety of functions for leveraging PsyGeNET database and facilitating its analysis and interpretation. The package offers different types of queries to the database along with variety of analysis and visualization tools, including the study of the anatomical structures in which the genes are expressed and gaining insight of gene's molecular function. Psygenet2r is especially suited for network medicine analysis of psychiatric disorders. AVAILABILITY AND IMPLEMENTATION The package is implemented in R and is available under MIT license from Bioconductor (http://bioconductor.org/packages/release/bioc/html/psygenet2r.html). CONTACT juanr.gonzalez@isglobal.org or laura.furlong@upf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
38
|
Redundancy analysis allows improved detection of methylation changes in large genomic regions. BMC Bioinformatics 2017; 18:553. [PMID: 29237399 PMCID: PMC5729265 DOI: 10.1186/s12859-017-1986-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 12/05/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND DNA methylation is an epigenetic process that regulates gene expression. Methylation can be modified by environmental exposures and changes in the methylation patterns have been associated with diseases. Methylation microarrays measure methylation levels at more than 450,000 CpGs in a single experiment, and the most common analysis strategy is to perform a single probe analysis to find methylation probes associated with the outcome of interest. However, methylation changes usually occur at the regional level: for example, genomic structural variants can affect methylation patterns in regions up to several megabases in length. Existing DMR methods provide lists of Differentially Methylated Regions (DMRs) of up to only few kilobases in length, and cannot check if a target region is differentially methylated. Therefore, these methods are not suitable to evaluate methylation changes in large regions. To address these limitations, we developed a new DMR approach based on redundancy analysis (RDA) that assesses whether a target region is differentially methylated. RESULTS Using simulated and real datasets, we compared our approach to three common DMR detection methods (Bumphunter, blockFinder, and DMRcate). We found that Bumphunter underestimated methylation changes and blockFinder showed poor performance. DMRcate showed poor power in the simulated datasets and low specificity in the real data analysis. Our method showed very high performance in all simulation settings, even with small sample sizes and subtle methylation changes, while controlling type I error. Other advantages of our method are: 1) it estimates the degree of association between the DMR and the outcome; 2) it can analyze a targeted or region of interest; and 3) it can evaluate the simultaneous effects of different variables. The proposed methodology is implemented in MEAL, a Bioconductor package designed to facilitate the analysis of methylation data. CONCLUSIONS We propose a multivariate approach to decipher whether an outcome of interest alters the methylation pattern of a region of interest. The method is designed to analyze large target genomic regions and outperforms the three most popular methods for detecting DMRs. Our method can evaluate factors with more than two levels or the simultaneous effect of more than one continuous variable, which is not possible with the state-of-the-art methods.
Collapse
|
39
|
The acute effects of ultraviolet radiation on the blood transcriptome are independent of plasma 25OHD 3. ENVIRONMENTAL RESEARCH 2017; 159:239-248. [PMID: 28822308 DOI: 10.1016/j.envres.2017.07.045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Revised: 07/05/2017] [Accepted: 07/25/2017] [Indexed: 06/07/2023]
Abstract
The molecular basis of many health outcomes attributed to solar ultraviolet radiation (UVR) is unknown. We tested the hypothesis that they may originate from transcriptional changes in blood cells. This was determined by assessing the effect of fluorescent solar simulated radiation (FSSR) on the transcriptional profile of peripheral blood pre- and 6h, 24h and 48h post-exposure in nine healthy volunteers. Expression of 20 genes was down-regulated and one was up-regulated at 6h after FSSR. All recovered to baseline expression at 24h or 48h. These genes have been associated with immune regulation, cancer and blood pressure; health effects attributed to vitamin D via solar UVR exposure. Plasma 25-hydroxyvitamin D3 [25OHD3] levels increased over time after FSSR and were maximal at 48h. The increase was more pronounced in participants with low basal 25OHD3 levels. Mediation analyses suggested that changes in gene expression due to FSSR were independent of 25OHD3 and blood cell subpopulations.
Collapse
|
40
|
A systematic comparison of statistical methods to detect interactions in exposome-health associations. Environ Health 2017; 16:74. [PMID: 28709428 PMCID: PMC5513197 DOI: 10.1186/s12940-017-0277-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/11/2017] [Indexed: 05/20/2023]
Abstract
BACKGROUND There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions. METHODS Simulations were based on an exposome including 237 exposures with a realistic correlation structure. We considered several statistical regression-based methods, including two-step Environment-Wide Association Study (EWAS2), the Deletion/Substitution/Addition (DSA) algorithm, the Least Absolute Shrinkage and Selection Operator (LASSO), Group-Lasso INTERaction-NET (GLINTERNET), a three-step method based on regression trees and finally Boosted Regression Trees (BRT). We assessed the performance of each method in terms of model size, predictive ability, sensitivity and false discovery rate. RESULTS GLINTERNET and DSA had better overall performance than the other methods, with GLINTERNET having better properties in terms of selecting the true predictors (sensitivity) and of predictive ability, while DSA had a lower number of false positives. In terms of ability to capture interaction terms, GLINTERNET and DSA had again the best performances, with the same trade-off between sensitivity and false discovery proportion. When GLINTERNET and DSA failed to select an exposure truly associated with the outcome, they tended to select a highly correlated one. When interactions were not present in the data, using variable selection methods that allowed for interactions had only slight costs in performance compared to methods that only searched for main effects. CONCLUSIONS GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.
Collapse
|
41
|
Novel genes involved in severe early-onset obesity revealed by rare copy number and sequence variants. PLoS Genet 2017; 13:e1006657. [PMID: 28489853 PMCID: PMC5443539 DOI: 10.1371/journal.pgen.1006657] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 05/24/2017] [Accepted: 02/26/2017] [Indexed: 12/26/2022] Open
Abstract
Obesity is a multifactorial disorder with high heritability (50–75%), which is probably higher in early-onset and severe cases. Although rare monogenic forms and several genes and regions of susceptibility, including copy number variants (CNVs), have been described, the genetic causes underlying the disease still remain largely unknown. We searched for rare CNVs (>100kb in size, altering genes and present in <1/2000 population controls) in 157 Spanish children with non-syndromic early-onset obesity (EOO: body mass index >3 standard deviations above the mean at <3 years of age) using SNP array molecular karyotypes. We then performed case control studies (480 EOO cases/480 non-obese controls) with the validated CNVs and rare sequence variants (RSVs) detected by targeted resequencing of selected CNV genes (n = 14), and also studied the inheritance patterns in available first-degree relatives. A higher burden of gain-type CNVs was detected in EOO cases versus controls (OR = 1.71, p-value = 0.0358). In addition to a gain of the NPY gene in a familial case with EOO and attention deficit hyperactivity disorder, likely pathogenic CNVs included gains of glutamate receptors (GRIK1, GRM7) and the X-linked gastrin-peptide receptor (GRPR), all inherited from obese parents. Putatively functional RSVs absent in controls were also identified in EOO cases at NPY, GRIK1 and GRPR. A patient with a heterozygous deletion disrupting two contiguous and related genes, SLCO4C1 and SLCO6A1, also had a missense RSV at SLCO4C1 on the other allele, suggestive of a recessive model. The genes identified showed a clear enrichment of shared co-expression partners with known genes strongly related to obesity, reinforcing their role in the pathophysiology of the disease. Our data reveal a higher burden of rare CNVs and RSVs in several related genes in patients with EOO compared to controls, and implicate NPY, GRPR, two glutamate receptors and SLCO4C1 in highly penetrant forms of familial obesity. Although there is strong evidence for a high genetic component of obesity, the underlying genetic causes are largely unknown, mostly due to the highly heterogeneous nature of the disorder. In this work, we have focused on the most severe end of the spectrum, severe obesity with early-onset in childhood, which is more likely due to genetic alterations. We screened for rare copy number variation (CNV) a sample of 157 Spanish children with early-onset obesity using molecular karyotypes and then studied the genes altered by CNVs in 480 cases and 480 non-obese controls. We identified a higher burden of gain-type CNVs in cases as well as several CNVs and sequence variants that were specific of the obese population. Interestingly, the genes identified shared co-expression partners with known obesity genes. Among those, the genes encoding the neuropeptide Y (NPY), two glutamate receptors (GRIK1, GRM7), the X-linked gastrin-peptide receptor (GRPR), and the organic anion transporter (SLCO4C1) are novel obesity candidate genes that may contribute to highly penetrant forms of familial obesity.
Collapse
|
42
|
Polymorphisms in the SNRPN gene are associated with obesity susceptibility in a Spanish population. J Gene Med 2017; 19. [PMID: 28387446 DOI: 10.1002/jgm.2956] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 03/15/2017] [Accepted: 04/04/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND SNRPN, which codes for the RNA-binding SmN protein, is a candidate gene for Prader-Willi syndrome. One characteristic of this neuroendocrine disorder is hyperphagia resulting in extreme obesity later in life. In the present study, we aimed to assess whether variability within this gene could be implicated in obesity susceptibility. METHODS A case-control study was performed including 265 unrelated patients with nonsyndromic and early-onset severe obesity, belonging to high-risk obesity families from Spanish ancestry; 184 healthy control individuals were included representative of the same genetic background and sex-matched. Forty-nine single nucleotide polymorphisms (SNPs) spanning the entire SNRPN gene were selected and genotyped using the Sequenom MassARRAY platform (Sequenom Inc., San Diego, CA, USA). RESULTS The four SNPs, rs12905653, rs752874, rs1391516 and rs2047433, were found to be nominally associated with obesity (p < 0.03). The diversity haplotype distribution among cases and controls identified the combination rs12905653-T/rs8028366-A/rs4028395-T as being strongly and inversely associated with obesity (odds ratio = 0.49; p = 0.0006). A genetic risk score was built based on rs12905653, rs1391516 and rs2047433 SNPs and each unit increase in genetic risk score increased the obesity risk by 49% (odds ratio = 1.49, 95% confidence interval = 1.24-1.80). CONCLUSIONS To our knowledge, this is the first study reporting an association between variability in the SNRPN gene and the risk of being obese. Interestingly, it was the major allele of each SNP that was found to be associated with the risk of weight gain. Further studies analyzing this locus and the possible additive deleterious capability of SNP combinations could be useful for demonstrating the development of obesity.
Collapse
|
43
|
MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration. BMC Bioinformatics 2017; 18:36. [PMID: 28095799 PMCID: PMC5240259 DOI: 10.1186/s12859-016-1455-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 12/24/2016] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS MultiDataSet is a suitable class for data integration under R and Bioconductor framework.
Collapse
|
44
|
Imaging genetics in attention-deficit/hyperactivity disorder and related neurodevelopmental domains: state of the art. Brain Imaging Behav 2016; 11:1922-1931. [DOI: 10.1007/s11682-016-9663-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
45
|
A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1848-1856. [PMID: 27219331 PMCID: PMC5132632 DOI: 10.1289/ehp172] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 01/12/2016] [Accepted: 04/28/2016] [Indexed: 05/17/2023]
Abstract
BACKGROUND The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. OBJECTIVES We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. METHODS In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. RESULTS On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. CONCLUSIONS Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.
Collapse
|
46
|
Genetic polymorphisms associated with increased risk of developing chronic myelogenous leukemia. Oncotarget 2016; 6:36269-77. [PMID: 26474455 PMCID: PMC4742176 DOI: 10.18632/oncotarget.5915] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 09/14/2015] [Indexed: 12/22/2022] Open
Abstract
Little is known about inherited factors associated with the risk of developing chronic myelogenous leukemia (CML). We used a dedicated DNA chip containing 16 561 single nucleotide polymorphisms (SNPs) covering 1 916 candidate genes to analyze 437 CML patients and 1 144 healthy control individuals. Single SNP association analysis identified 139 SNPs that passed multiple comparisons (1% false discovery rate). The HDAC9, AVEN, SEMA3C, IKBKB, GSTA3, RIPK1 and FGF2 genes were each represented by three SNPs, the PSM family by four SNPs and the SLC15A1 gene by six. Haplotype analysis showed that certain combinations of rare alleles of these genes increased the risk of developing CML by more than two or three-fold. A classification tree model identified five SNPs belonging to the genes PSMB10, TNFRSF10D, PSMB2, PPARD and CYP26B1, which were associated with CML predisposition. A CML-risk-allele score was created using these five SNPs. This score was accurate for discriminating CML status (AUC: 0.61, 95%CI: 0.58-0.64). Interestingly, the score was associated with age at diagnosis and the average number of risk alleles was significantly higher in younger patients. The risk-allele score showed the same distribution in the general population (HapMap CEU samples) as in our control individuals and was associated with differential gene expression patterns of two genes (VAPA and TDRKH). In conclusion, we describe haplotypes and a genetic score that are significantly associated with a predisposition to develop CML. The SNPs identified will also serve to drive fundamental research on the putative role of these genes in CML development.
Collapse
|
47
|
Ancient Haplotypes at the 15q24.2 Microdeletion Region Are Linked to Brain Expression of MAN2C1 and Children's Intelligence. PLoS One 2016; 11:e0157739. [PMID: 27355585 PMCID: PMC4927142 DOI: 10.1371/journal.pone.0157739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 06/05/2016] [Indexed: 11/26/2022] Open
Abstract
The chromosome bands 15q24.1-15q24.3 contain a complex region with numerous segmental duplications that predispose to regional microduplications and microdeletions, both of which have been linked to intellectual disability, speech delay and autistic features. The region may also harbour common inversion polymorphisms whose functional and phenotypic manifestations are unknown. Using single nucleotide polymorphism (SNP) data, we detected four large contiguous haplotype-genotypes at 15q24 with Mendelian inheritance in 2,562 trios, African origin, high population stratification and reduced recombination rates. Although the haplotype-genotypes have been most likely generated by decreased or absent recombination among them, we could not confirm that they were the product of inversion polymorphisms in the region. One of the blocks was composed of three haplotype-genotypes (N1a, N1b and N2), which significantly correlated with intelligence quotient (IQ) in 2,735 children of European ancestry from three independent population cohorts. Homozygosity for N2 was associated with lower verbal IQ (2.4-point loss, p-value = 0.01), while homozygosity for N1b was associated with 3.2-point loss in non-verbal IQ (p-value = 0.0006). The three alleles strongly correlated with expression levels of MAN2C1 and SNUPN in blood and brain. Homozygosity for N2 correlated with over-expression of MAN2C1 over many brain areas but the occipital cortex where N1b homozygous highly under-expressed. Our population-based analyses suggest that MAN2C1 may contribute to the verbal difficulties observed in microduplications and to the intellectual disability of microdeletion syndromes, whose characteristic dosage increment and removal may affect different brain areas.
Collapse
|
48
|
APOE
and
MS4A6A
interact with GnRH signaling in Alzheimer's disease: Enrichment of epistatic effects. Alzheimers Dement 2016; 13:493-497. [DOI: 10.1016/j.jalz.2016.05.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 05/04/2016] [Accepted: 05/22/2016] [Indexed: 10/21/2022]
|
49
|
Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation. Nat Commun 2015; 6:8658. [PMID: 26635082 PMCID: PMC4686825 DOI: 10.1038/ncomms9658] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 09/17/2015] [Indexed: 01/11/2023] Open
Abstract
Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered.
Collapse
|
50
|
affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling. BMC Bioinformatics 2015; 16:167. [PMID: 25991004 PMCID: PMC4438530 DOI: 10.1186/s12859-015-0608-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 04/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. Results We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Conclusion Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0608-y) contains supplementary material, which is available to authorized users.
Collapse
|