26
|
Vives-Usano M, Hernandez-Ferrer C, Maitre L, Ruiz-Arenas C, Andrusaityte S, Borràs E, Carracedo Á, Casas M, Chatzi L, Coen M, Estivill X, González JR, Grazuleviciene R, Gutzkow KB, Keun HC, Lau CHE, Cadiou S, Lepeule J, Mason D, Quintela I, Robinson O, Sabidó E, Santorelli G, Schwarze PE, Siskos AP, Slama R, Vafeiadi M, Martí E, Vrijheid M, Bustamante M. In utero and childhood exposure to tobacco smoke and multi-layer molecular signatures in children. BMC Med 2020; 18:243. [PMID: 32811491 PMCID: PMC7437049 DOI: 10.1186/s12916-020-01686-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/29/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND The adverse health effects of early life exposure to tobacco smoking have been widely reported. In spite of this, the underlying molecular mechanisms of in utero and postnatal exposure to tobacco smoke are only partially understood. Here, we aimed to identify multi-layer molecular signatures associated with exposure to tobacco smoke in these two exposure windows. METHODS We investigated the associations of maternal smoking during pregnancy and childhood secondhand smoke (SHS) exposure with molecular features measured in 1203 European children (mean age 8.1 years) from the Human Early Life Exposome (HELIX) project. Molecular features, covering 4 layers, included blood DNA methylation and gene and miRNA transcription, plasma proteins, and sera and urinary metabolites. RESULTS Maternal smoking during pregnancy was associated with DNA methylation changes at 18 loci in child blood. DNA methylation at 5 of these loci was related to expression of the nearby genes. However, the expression of these genes themselves was only weakly associated with maternal smoking. Conversely, childhood SHS was not associated with blood DNA methylation or transcription patterns, but with reduced levels of several serum metabolites and with increased plasma PAI1 (plasminogen activator inhibitor-1), a protein that inhibits fibrinolysis. Some of the in utero and childhood smoking-related molecular marks showed dose-response trends, with stronger effects with higher dose or longer duration of the exposure. CONCLUSION In this first study covering multi-layer molecular features, pregnancy and childhood exposure to tobacco smoke were associated with distinct molecular phenotypes in children. The persistent and dose-dependent changes in the methylome make CpGs good candidates to develop biomarkers of past exposure. Moreover, compared to methylation, the weak association of maternal smoking in pregnancy with gene expression suggests different reversal rates and a methylation-based memory to past exposures. Finally, certain metabolites and protein markers evidenced potential early biological effects of postnatal SHS, such as fibrinolysis.
Collapse
|
27
|
González JR, Ruiz-Arenas C, Cáceres A, Morán I, López-Sánchez M, Alonso L, Tolosana I, Guindo-Martínez M, Mercader JM, Esko T, Torrents D, González J, Pérez-Jurado LA. Polymorphic Inversions Underlie the Shared Genetic Susceptibility of Obesity-Related Diseases. Am J Hum Genet 2020; 106:846-858. [PMID: 32470372 DOI: 10.1016/j.ajhg.2020.04.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/28/2020] [Indexed: 11/25/2022] Open
Abstract
The burden of several common diseases including obesity, diabetes, hypertension, asthma, and depression is increasing in most world populations. However, the mechanisms underlying the numerous epidemiological and genetic correlations among these disorders remain largely unknown. We investigated whether common polymorphic inversions underlie the shared genetic influence of these disorders. We performed an inversion association analysis including 21 inversions and 25 obesity-related traits on a total of 408,898 Europeans and validated the results in 67,299 independent individuals. Seven inversions were associated with multiple diseases while inversions at 8p23.1, 16p11.2, and 11q13.2 were strongly associated with the co-occurrence of obesity with other common diseases. Transcriptome analysis across numerous tissues revealed strong candidate genes for obesity-related traits. Analyses in human pancreatic islets indicated the potential mechanism of inversions in the susceptibility of diabetes by disrupting the cis-regulatory effect of SNPs from their target genes. Our data underscore the role of inversions as major genetic contributors to the joint susceptibility to common complex diseases.
Collapse
|
28
|
Vilor-Tejedor N, Ikram MA, Roshchupkin GV, Cáceres A, Alemany S, Vernooij MW, Niessen WJ, van Duijn CM, Sunyer J, Adams HH, González JR. Independent Multiple Factor Association Analysis for Multiblock Data in Imaging Genetics. Neuroinformatics 2020; 17:583-592. [PMID: 30903541 DOI: 10.1007/s12021-019-09416-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Multivariate methods have the potential to better capture complex relationships that may exist between different biological levels. Multiple Factor Analysis (MFA) is one of the most popular methods to obtain factor scores and measures of discrepancy between data sets. However, singular value decomposition in MFA is based on PCA, which is adequate only if the data is normally distributed, linear or stationary. In addition, including strongly correlated variables can overemphasize the contribution of the estimated components. In this work, we introduced a novel method referred as Independent Multifactorial Analysis (ICA-MFA) to derive relevant features from multiscale data. This method is an extended implementation of MFA, where the component value decomposition is based on Independent Component Analysis. In addition, ICA-MFA incorporates a predictive step based on an Independent Component Regression. We evaluated and compared the performance of ICA-MFA with both, the MFA method and traditional univariate analyses, in a simulation study. We showed how ICA-MFA explained up to 10-fold more variance than MFA and univariate methods. We applied the proposed algorithm in a study of 4057 individuals belonging to the population-based Rotterdam Study with available genetic and neuroimaging data, as well as information about executive cognitive functioning. Specifically, we used ICA-MFA to detect relevant genetic features related to structural brain regions, which in turn were involved, in the mechanisms of executive cognitive function. The proposed strategy makes it possible to determine the degree to which the whole set of genetic and/or neuroimaging markers contribute to the variability of the symptomatology jointly, rather than individually. While univariate results and MFA combinations only explained a limited proportion of variance (less than 2%), our method increased the explained variance (10%) and allowed the identification of significant components that maximize the variance explained in the model. The potential application of the ICA-MFA algorithm constitutes an important aspect of integrating multivariate multiscale data, specifically in the field of Neurogenetics.
Collapse
|
29
|
Bustamante M, Hernandez-Ferrer C, Tewari A, Sarria Y, Harrison GI, Puigdecanet E, Nonell L, Kang W, Friedländer MR, Estivill X, González JR, Nieuwenhuijsen M, Young AR. Dose and time effects of solar-simulated ultraviolet radiation on the in vivo human skin transcriptome. Br J Dermatol 2019; 182:1458-1468. [PMID: 31529490 PMCID: PMC7318624 DOI: 10.1111/bjd.18527] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2019] [Indexed: 12/18/2022]
Abstract
Background Terrestrial ultraviolet (UV) radiation causes erythema, oxidative stress, DNA mutations and skin cancer. Skin can adapt to these adverse effects by DNA repair, apoptosis, keratinization and tanning. Objectives To investigate the transcriptional response to fluorescent solar‐simulated radiation (FSSR) in sun‐sensitive human skin in vivo. Methods Seven healthy male volunteers were exposed to 0, 3 and 6 standard erythemal doses (SED). Skin biopsies were taken at 6 h and 24 h after exposure. Gene and microRNA expression were quantified with next generation sequencing. A set of candidate genes was validated by quantitative polymerase chain reaction (qPCR); and wavelength dependence was examined in other volunteers through microarrays. Results The number of differentially expressed genes increased with FSSR dose and decreased between 6 and 24 h. Six hours after 6 SED, 4071 genes were differentially expressed, but only 16 genes were affected at 24 h after 3 SED. Genes for apoptosis and keratinization were prominent at 6 h, whereas inflammation and immunoregulation genes were predominant at 24 h. Validation by qPCR confirmed the altered expression of nine genes detected under all conditions; genes related to DNA repair and apoptosis; immunity and inflammation; pigmentation; and vitamin D synthesis. In general, candidate genes also responded to UVA1 (340–400 nm) and/or UVB (300 nm), but with variations in wavelength dependence and peak expression time. Only four microRNAs were differentially expressed by FSSR. Conclusions The UV radiation doses of this acute study are readily achieved daily during holidays in the sun, suggesting that the skin transcriptional profile of ‘typical’ holiday makers is markedly deregulated. What's already known about this topic? The skin's transcriptional profile underpins its adverse (i.e. inflammation) and adaptive molecular, cellular and clinical responses (i.e. tanning, hyperkeratosis) to solar ultraviolet radiation. Few studies have assessed microRNA and gene expression in vivo in humans, and there is a lack of information on dose, time and waveband effects.
What does this study add? Acute doses of fluorescent solar‐simulated radiation (FSSR), of similar magnitude to those received daily in holiday situations, markedly altered the skin's transcriptional profiles. The number of differentially expressed genes was FSSR‐dose‐dependent, reached a peak at 6 h and returned to baseline at 24 h. The initial transcriptional response involved apoptosis and keratinization, followed by inflammation and immune modulation. In these conditions, microRNA expression was less affected than gene expression.
Linked Comment:Hart. Br J Dermatol 2020; 182:1328–1329. Plain language summary available online Respond to this article
Collapse
|
30
|
Ruiz-Arenas C, Cáceres A, Moreno V, González JR. Common polymorphic inversions at 17q21.31 and 8p23.1 associate with cancer prognosis. Hum Genomics 2019; 13:57. [PMID: 31753042 PMCID: PMC6873427 DOI: 10.1186/s40246-019-0242-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Chromosomal inversions are structural genetic variants where a chromosome segment changes its orientation. While sporadic de novo inversions are known genetic risk factors for cancer susceptibility, it is unknown if common polymorphic inversions are also associated with the prognosis of common tumors, as they have been linked to other complex diseases. We studied the association of two well-characterized human inversions at 17q21.31 and 8p23.1 with the prognosis of lung, liver, breast, colorectal, and stomach cancers. RESULTS Using data from The Cancer Genome Atlas (TCGA), we observed that inv8p23.1 was associated with overall survival in breast cancer and that inv17q21.31 was associated with overall survival in stomach cancer. In the meta-analysis of two independent studies, inv17q21.31 heterozygosity was significantly associated with colorectal disease-free survival. We found that the association was mediated by the de-methylation of cg08283464 and cg03999934, also linked to lower disease-free survival. CONCLUSIONS Our results suggest that chromosomal inversions are important genetic factors of tumor prognosis, likely affecting changes in methylation patterns.
Collapse
|
31
|
Ruiz-Arenas C, Cáceres A, López-Sánchez M, Tolosana I, Pérez-Jurado L, González JR. scoreInvHap: Inversion genotyping for genome-wide association studies. PLoS Genet 2019; 15:e1008203. [PMID: 31269027 PMCID: PMC6608898 DOI: 10.1371/journal.pgen.1008203] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/17/2019] [Indexed: 02/02/2023] Open
Abstract
Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor. Chromosomal inversions are structural variants consisting on an orientation change of a chromosome segment. Inversions have been linked to some phenotypic differences between individuals and to genetic divergence. However, their overall contribution to complex diseases is largely underdetermined as there are no high-throughput methods to call inversion-genotypes in large cohort studies. Here, we propose a new method, scoreInvHap, to call individual inversion genotypes from their haplotype similarity. We show that scoreInvHap has a high performance when analyzing heterogeneous sources of SNP data. Our current implementation contains 20 human inversions that can be readily genotyped in existing GWAS datasets. We exemplify the utility of scoreInvHap by running the first-genome wide association of experimentally validated inversions and a multi-centric inversion association study. All in all, scoreInvHap can substantially contribute to increase our knowledge of the role of chromosomal inversions in complex diseases by re-analyzing data from existing genetic association studies.
Collapse
|
32
|
Vilor-Tejedor N, Alemany S, Forns J, Cáceres A, Murcia M, Macià D, Pujol J, Sunyer J, González JR. Assessment of Susceptibility Risk Factors for ADHD in Imaging Genetic Studies. J Atten Disord 2019; 23:671-681. [PMID: 27535943 DOI: 10.1177/1087054716664408] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE ADHD consists of a count of symptoms that often presents heterogeneity due to overdispersion and excess of zeros. Statistical inference is usually based on a dichotomous outcome that is underpowered. The main goal of this study was to determine a suited probability distribution to analyze ADHD symptoms in Imaging Genetic studies. METHOD We used two independent population samples of children to evaluate the consistency of the standard probability distributions based on count data for describing ADHD symptoms. RESULTS We showed that the zero-inflated negative binomial (ZINB) distribution provided the best power for modeling ADHD symptoms. ZINB reveals a genetic variant, rs273342 (Microtubule-Associated Protein [MAPRE2]), associated with ADHD ( p value = 2.73E-05). This variant was also associated with perivascular volumes (Virchow-Robin spaces; p values < 1E-03). No associations were found when using dichotomous definition. CONCLUSION We suggest that an appropriate modeling of ADHD symptoms increases statistical power to establish significant risk factors.
Collapse
|
33
|
Cáceres A, González JR. When pitch adds to volume: coregulation of transcript diversity predicts gene function. BMC Genomics 2018; 19:926. [PMID: 30545302 PMCID: PMC6293560 DOI: 10.1186/s12864-018-5263-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022] Open
Abstract
Background Genes corregulate their overall transcript volumes to perform their physiological functions. However, it is unknown if they additionally coregulate their transcript diversities. We studied the reliability, consistency and functional associations of co-splicing correlations of genes of interest, across two independent studies, multiple tissues and two statistical methods. We thoroughly investigated the reproducibility of co-splicing correlations of APP, the candidate gene of Azheimer’s disease (AD). We then studied how co-splicing correlations in different tissues contributed to predict functional interactions of three other genes and finally computed co-splicing frequency for 17 thousand genes across 52 human tissues. Results We replicated co-splicing correlations between APP and 5 AD-related genes and reproduced expected enrichment of APP co-splicing in synaptic vesicle cycle and proteosome pathways. We observed novel associations for tissue vulnerability to disease with enrichment in APP co-splicing, co-expression and epistasis in AD. APP co-splicing was the strongest predictor and replicated between studies. We confirmed known gene interactions of PRPF8 and GRIA1 in testis and brain cortex, and observed a novel interaction of FGFR2, in breast and prostate, modulated by cancer risk-variants. We produced a co-splicing map across 52 human tissues to help predict the function of over 17 thousand genes. Conclusions We show that coregulation of transcript diversities provides novel biological insights in gene physiology and helps to interpret GWAS results. Co-splicing correlations are reliable and frequent and should be further pursued to help predict gene function. Our results additionally support current AD interventions aiming at the ubiquitin proteosome pathway but unveil the need to consider transcript diversity in addition to volume to assess treatment response and susceptibility to the disease. Electronic supplementary material The online version of this article (10.1186/s12864-018-5263-z) contains supplementary material, which is available to authorized users.
Collapse
|
34
|
Vargas JE, Kubesch N, Hernandéz-Ferrer C, Carrasco-Turigas G, Bustamante M, Nieuwenhuijsen M, González JR. A systemic approach to identify signaling pathways activated during short-term exposure to traffic-related urban air pollution from human blood. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018; 25:29572-29583. [PMID: 30141164 DOI: 10.1007/s11356-018-3009-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 08/17/2018] [Indexed: 06/08/2023]
Abstract
The molecular mechanisms that promote pathologic alterations in human physiology mediated by short-term exposure to traffic pollutants remains not well understood. This work was to develop mechanistic networks to determine which specific pathways are activated by real-world exposures of traffic-related air pollution (TRAP) during rest and moderate physical activity (PA). A controlled crossover study to compare whole blood gene expression pre and post short-term exposure to high and low of TRAP was performed together with systems biology analysis. Twenty-eight healthy volunteers aged between 21 and 53 years were recruited. These subjects were exposed during 2 h to different pollution levels (high and low TRAP levels), while either cycling or resting. Global transcriptome profile of each condition was performed from human whole blood samples. Microarrays analysis was performed to obtain differential expressed genes (DEG) to be used as initial input for GeneMANIA software to obtain protein-protein (PPI) networks. Two networks were found reflecting high or low TRAP levels, which shared only 5.6 and 15.5% of its nodes, suggesting specific cell signaling pathways being activated in each environmental condition. However, gene ontology analysis of each PPI network suggests that each level of TRAP regulate common members of NF-κB signaling pathway. Our work provides the first approach describing mechanistic networks to understand TRAP effects on a system level.
Collapse
|
35
|
Vilor-Tejedor N, Alemany S, Cáceres A, Bustamante M, Mortamais M, Pujol J, Sunyer J, González JR. Sparse multiple factor analysis to integrate genetic data, neuroimaging features, and attention-deficit/hyperactivity disorder domains. Int J Methods Psychiatr Res 2018; 27:e1738. [PMID: 30105890 PMCID: PMC6877273 DOI: 10.1002/mpr.1738] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 05/17/2018] [Accepted: 06/26/2018] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES We proposed the application of a multivariate cross-sectional framework based on a combination of a variable selection method and a multiple factor analysis (MFA) in order to identify complex meaningful biological signals related to attention-deficit/hyperactivity disorder (ADHD) symptoms and hyperactivity/inattention domains. METHODS The study included 135 children from the general population with genomic and neuroimaging data. ADHD symptoms were assessed using a questionnaire based on ADHD-DSM-IV criteria. In all analyses, the raw sum scores of the hyperactivity and inattention domains and total ADHD were used. The analytical framework comprised two steps. First, zero-inflated negative binomial linear model via penalized maximum likelihood (LASSO-ZINB) was performed. Second, the most predictive features obtained with LASSO-ZINB were used as input for the MFA. RESULTS We observed significant relationships between ADHD symptoms and hyperactivity and inattention domains with white matter, gray matter regions, and cerebellum, as well as with loci within chromosome 1. CONCLUSIONS Multivariate methods can be used to advance the neurobiological characterization of complex diseases, improving the statistical power with respect to univariate methods, allowing the identification of meaningful biological signals in Imaging Genetic studies.
Collapse
|
36
|
Vilor-Tejedor N, Alemany S, Cáceres A, Bustamante M, Pujol J, Sunyer J, González JR. Strategies for integrated analysis in imaging genetics studies. Neurosci Biobehav Rev 2018; 93:57-70. [PMID: 29944960 DOI: 10.1016/j.neubiorev.2018.06.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/30/2018] [Accepted: 06/15/2018] [Indexed: 02/06/2023]
Abstract
Imaging Genetics (IG) integrates neuroimaging and genomic data from the same individual, deepening our knowledge of the biological mechanisms behind neurodevelopmental domains and neurological disorders. Although the literature on IG has exponentially grown over the past years, the majority of studies have mainly analyzed associations between candidate brain regions and individual genetic variants. However, this strategy is not designed to deal with the complexity of neurobiological mechanisms underlying behavioral and neurodevelopmental domains. Moreover, larger sample sizes and increased multidimensionality of this type of data represents a challenge for standardizing modeling procedures in IG research. This review provides a systematic update of the methods and strategies currently used in IG studies, and serves as an analytical framework for researchers working in this field. To complement the functionalities of the Neuroconductor framework, we also describe existing R packages that implement these methodologies. In addition, we present an overview of how these methodological approaches are applied in integrating neuroimaging and genetic data.
Collapse
|
37
|
Gutiérrez-Sacristán A, Hernández-Ferrer C, González JR, Furlong LI. psygenet2r: a R/Bioconductor package for the analysis of psychiatric disease genes. Bioinformatics 2017; 33:4004-4006. [PMID: 28961763 PMCID: PMC5860088 DOI: 10.1093/bioinformatics/btx506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 08/04/2017] [Accepted: 08/08/2017] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Psychiatric disorders have a great impact on morbidity and mortality. Genotype-phenotype resources for psychiatric diseases are key to enable the translation of research findings to a better care of patients. PsyGeNET is a knowledge resource on psychiatric diseases and their genes, developed by text mining and curated by domain experts. RESULTS We present psygenet2r, an R package that contains a variety of functions for leveraging PsyGeNET database and facilitating its analysis and interpretation. The package offers different types of queries to the database along with variety of analysis and visualization tools, including the study of the anatomical structures in which the genes are expressed and gaining insight of gene's molecular function. Psygenet2r is especially suited for network medicine analysis of psychiatric disorders. AVAILABILITY AND IMPLEMENTATION The package is implemented in R and is available under MIT license from Bioconductor (http://bioconductor.org/packages/release/bioc/html/psygenet2r.html). CONTACT juanr.gonzalez@isglobal.org or laura.furlong@upf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
38
|
Ruiz-Arenas C, González JR. Redundancy analysis allows improved detection of methylation changes in large genomic regions. BMC Bioinformatics 2017; 18:553. [PMID: 29237399 PMCID: PMC5729265 DOI: 10.1186/s12859-017-1986-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 12/05/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND DNA methylation is an epigenetic process that regulates gene expression. Methylation can be modified by environmental exposures and changes in the methylation patterns have been associated with diseases. Methylation microarrays measure methylation levels at more than 450,000 CpGs in a single experiment, and the most common analysis strategy is to perform a single probe analysis to find methylation probes associated with the outcome of interest. However, methylation changes usually occur at the regional level: for example, genomic structural variants can affect methylation patterns in regions up to several megabases in length. Existing DMR methods provide lists of Differentially Methylated Regions (DMRs) of up to only few kilobases in length, and cannot check if a target region is differentially methylated. Therefore, these methods are not suitable to evaluate methylation changes in large regions. To address these limitations, we developed a new DMR approach based on redundancy analysis (RDA) that assesses whether a target region is differentially methylated. RESULTS Using simulated and real datasets, we compared our approach to three common DMR detection methods (Bumphunter, blockFinder, and DMRcate). We found that Bumphunter underestimated methylation changes and blockFinder showed poor performance. DMRcate showed poor power in the simulated datasets and low specificity in the real data analysis. Our method showed very high performance in all simulation settings, even with small sample sizes and subtle methylation changes, while controlling type I error. Other advantages of our method are: 1) it estimates the degree of association between the DMR and the outcome; 2) it can analyze a targeted or region of interest; and 3) it can evaluate the simultaneous effects of different variables. The proposed methodology is implemented in MEAL, a Bioconductor package designed to facilitate the analysis of methylation data. CONCLUSIONS We propose a multivariate approach to decipher whether an outcome of interest alters the methylation pattern of a region of interest. The method is designed to analyze large target genomic regions and outperforms the three most popular methods for detecting DMRs. Our method can evaluate factors with more than two levels or the simultaneous effect of more than one continuous variable, which is not possible with the state-of-the-art methods.
Collapse
|
39
|
Bustamante M, Hernandez-Ferrer C, Sarria Y, Harrison GI, Nonell L, Kang W, Friedländer MR, Estivill X, González JR, Nieuwenhuijsen M, Young AR. The acute effects of ultraviolet radiation on the blood transcriptome are independent of plasma 25OHD 3. ENVIRONMENTAL RESEARCH 2017; 159:239-248. [PMID: 28822308 DOI: 10.1016/j.envres.2017.07.045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Revised: 07/05/2017] [Accepted: 07/25/2017] [Indexed: 06/07/2023]
Abstract
The molecular basis of many health outcomes attributed to solar ultraviolet radiation (UVR) is unknown. We tested the hypothesis that they may originate from transcriptional changes in blood cells. This was determined by assessing the effect of fluorescent solar simulated radiation (FSSR) on the transcriptional profile of peripheral blood pre- and 6h, 24h and 48h post-exposure in nine healthy volunteers. Expression of 20 genes was down-regulated and one was up-regulated at 6h after FSSR. All recovered to baseline expression at 24h or 48h. These genes have been associated with immune regulation, cancer and blood pressure; health effects attributed to vitamin D via solar UVR exposure. Plasma 25-hydroxyvitamin D3 [25OHD3] levels increased over time after FSSR and were maximal at 48h. The increase was more pronounced in participants with low basal 25OHD3 levels. Mediation analyses suggested that changes in gene expression due to FSSR were independent of 25OHD3 and blood cell subpopulations.
Collapse
|
40
|
Barrera-Gómez J, Agier L, Portengen L, Chadeau-Hyam M, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen M, Vineis P, Vrijheid M, Vermeulen R, Slama R, Basagaña X. A systematic comparison of statistical methods to detect interactions in exposome-health associations. Environ Health 2017; 16:74. [PMID: 28709428 PMCID: PMC5513197 DOI: 10.1186/s12940-017-0277-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/11/2017] [Indexed: 05/20/2023]
Abstract
BACKGROUND There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions. METHODS Simulations were based on an exposome including 237 exposures with a realistic correlation structure. We considered several statistical regression-based methods, including two-step Environment-Wide Association Study (EWAS2), the Deletion/Substitution/Addition (DSA) algorithm, the Least Absolute Shrinkage and Selection Operator (LASSO), Group-Lasso INTERaction-NET (GLINTERNET), a three-step method based on regression trees and finally Boosted Regression Trees (BRT). We assessed the performance of each method in terms of model size, predictive ability, sensitivity and false discovery rate. RESULTS GLINTERNET and DSA had better overall performance than the other methods, with GLINTERNET having better properties in terms of selecting the true predictors (sensitivity) and of predictive ability, while DSA had a lower number of false positives. In terms of ability to capture interaction terms, GLINTERNET and DSA had again the best performances, with the same trade-off between sensitivity and false discovery proportion. When GLINTERNET and DSA failed to select an exposure truly associated with the outcome, they tended to select a highly correlated one. When interactions were not present in the data, using variable selection methods that allowed for interactions had only slight costs in performance compared to methods that only searched for main effects. CONCLUSIONS GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.
Collapse
|
41
|
Serra-Juhé C, Martos-Moreno GÁ, Bou de Pieri F, Flores R, González JR, Rodríguez-Santiago B, Argente J, Pérez-Jurado LA. Novel genes involved in severe early-onset obesity revealed by rare copy number and sequence variants. PLoS Genet 2017; 13:e1006657. [PMID: 28489853 PMCID: PMC5443539 DOI: 10.1371/journal.pgen.1006657] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 05/24/2017] [Accepted: 02/26/2017] [Indexed: 12/26/2022] Open
Abstract
Obesity is a multifactorial disorder with high heritability (50–75%), which is probably higher in early-onset and severe cases. Although rare monogenic forms and several genes and regions of susceptibility, including copy number variants (CNVs), have been described, the genetic causes underlying the disease still remain largely unknown. We searched for rare CNVs (>100kb in size, altering genes and present in <1/2000 population controls) in 157 Spanish children with non-syndromic early-onset obesity (EOO: body mass index >3 standard deviations above the mean at <3 years of age) using SNP array molecular karyotypes. We then performed case control studies (480 EOO cases/480 non-obese controls) with the validated CNVs and rare sequence variants (RSVs) detected by targeted resequencing of selected CNV genes (n = 14), and also studied the inheritance patterns in available first-degree relatives. A higher burden of gain-type CNVs was detected in EOO cases versus controls (OR = 1.71, p-value = 0.0358). In addition to a gain of the NPY gene in a familial case with EOO and attention deficit hyperactivity disorder, likely pathogenic CNVs included gains of glutamate receptors (GRIK1, GRM7) and the X-linked gastrin-peptide receptor (GRPR), all inherited from obese parents. Putatively functional RSVs absent in controls were also identified in EOO cases at NPY, GRIK1 and GRPR. A patient with a heterozygous deletion disrupting two contiguous and related genes, SLCO4C1 and SLCO6A1, also had a missense RSV at SLCO4C1 on the other allele, suggestive of a recessive model. The genes identified showed a clear enrichment of shared co-expression partners with known genes strongly related to obesity, reinforcing their role in the pathophysiology of the disease. Our data reveal a higher burden of rare CNVs and RSVs in several related genes in patients with EOO compared to controls, and implicate NPY, GRPR, two glutamate receptors and SLCO4C1 in highly penetrant forms of familial obesity. Although there is strong evidence for a high genetic component of obesity, the underlying genetic causes are largely unknown, mostly due to the highly heterogeneous nature of the disorder. In this work, we have focused on the most severe end of the spectrum, severe obesity with early-onset in childhood, which is more likely due to genetic alterations. We screened for rare copy number variation (CNV) a sample of 157 Spanish children with early-onset obesity using molecular karyotypes and then studied the genes altered by CNVs in 480 cases and 480 non-obese controls. We identified a higher burden of gain-type CNVs in cases as well as several CNVs and sequence variants that were specific of the obese population. Interestingly, the genes identified shared co-expression partners with known obesity genes. Among those, the genes encoding the neuropeptide Y (NPY), two glutamate receptors (GRIK1, GRM7), the X-linked gastrin-peptide receptor (GRPR), and the organic anion transporter (SLCO4C1) are novel obesity candidate genes that may contribute to highly penetrant forms of familial obesity.
Collapse
|
42
|
Albuquerque D, Manco L, González LM, Gervasini G, Benito GM, González JR, Rodríguez-López R. Polymorphisms in the SNRPN gene are associated with obesity susceptibility in a Spanish population. J Gene Med 2017; 19. [PMID: 28387446 DOI: 10.1002/jgm.2956] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 03/15/2017] [Accepted: 04/04/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND SNRPN, which codes for the RNA-binding SmN protein, is a candidate gene for Prader-Willi syndrome. One characteristic of this neuroendocrine disorder is hyperphagia resulting in extreme obesity later in life. In the present study, we aimed to assess whether variability within this gene could be implicated in obesity susceptibility. METHODS A case-control study was performed including 265 unrelated patients with nonsyndromic and early-onset severe obesity, belonging to high-risk obesity families from Spanish ancestry; 184 healthy control individuals were included representative of the same genetic background and sex-matched. Forty-nine single nucleotide polymorphisms (SNPs) spanning the entire SNRPN gene were selected and genotyped using the Sequenom MassARRAY platform (Sequenom Inc., San Diego, CA, USA). RESULTS The four SNPs, rs12905653, rs752874, rs1391516 and rs2047433, were found to be nominally associated with obesity (p < 0.03). The diversity haplotype distribution among cases and controls identified the combination rs12905653-T/rs8028366-A/rs4028395-T as being strongly and inversely associated with obesity (odds ratio = 0.49; p = 0.0006). A genetic risk score was built based on rs12905653, rs1391516 and rs2047433 SNPs and each unit increase in genetic risk score increased the obesity risk by 49% (odds ratio = 1.49, 95% confidence interval = 1.24-1.80). CONCLUSIONS To our knowledge, this is the first study reporting an association between variability in the SNRPN gene and the risk of being obese. Interestingly, it was the major allele of each SNP that was found to be associated with the risk of weight gain. Further studies analyzing this locus and the possible additive deleterious capability of SNP combinations could be useful for demonstrating the development of obesity.
Collapse
|
43
|
Hernandez-Ferrer C, Ruiz-Arenas C, Beltran-Gomila A, González JR. MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration. BMC Bioinformatics 2017; 18:36. [PMID: 28095799 PMCID: PMC5240259 DOI: 10.1186/s12859-016-1455-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 12/24/2016] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS MultiDataSet is a suitable class for data integration under R and Bioconductor framework.
Collapse
|
44
|
Vilor-Tejedor N, Cáceres A, Pujol J, Sunyer J, González JR. Imaging genetics in attention-deficit/hyperactivity disorder and related neurodevelopmental domains: state of the art. Brain Imaging Behav 2016; 11:1922-1931. [DOI: 10.1007/s11682-016-9663-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
45
|
Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1848-1856. [PMID: 27219331 PMCID: PMC5132632 DOI: 10.1289/ehp172] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 01/12/2016] [Accepted: 04/28/2016] [Indexed: 05/17/2023]
Abstract
BACKGROUND The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. OBJECTIVES We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. METHODS In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. RESULTS On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. CONCLUSIONS Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.
Collapse
|
46
|
Bruzzoni-Giovanelli H, González JR, Sigaux F, Villoutreix BO, Cayuela JM, Guilhot J, Preudhomme C, Guilhot F, Poyet JL, Rousselot P. Genetic polymorphisms associated with increased risk of developing chronic myelogenous leukemia. Oncotarget 2016; 6:36269-77. [PMID: 26474455 PMCID: PMC4742176 DOI: 10.18632/oncotarget.5915] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 09/14/2015] [Indexed: 12/22/2022] Open
Abstract
Little is known about inherited factors associated with the risk of developing chronic myelogenous leukemia (CML). We used a dedicated DNA chip containing 16 561 single nucleotide polymorphisms (SNPs) covering 1 916 candidate genes to analyze 437 CML patients and 1 144 healthy control individuals. Single SNP association analysis identified 139 SNPs that passed multiple comparisons (1% false discovery rate). The HDAC9, AVEN, SEMA3C, IKBKB, GSTA3, RIPK1 and FGF2 genes were each represented by three SNPs, the PSM family by four SNPs and the SLC15A1 gene by six. Haplotype analysis showed that certain combinations of rare alleles of these genes increased the risk of developing CML by more than two or three-fold. A classification tree model identified five SNPs belonging to the genes PSMB10, TNFRSF10D, PSMB2, PPARD and CYP26B1, which were associated with CML predisposition. A CML-risk-allele score was created using these five SNPs. This score was accurate for discriminating CML status (AUC: 0.61, 95%CI: 0.58-0.64). Interestingly, the score was associated with age at diagnosis and the average number of risk alleles was significantly higher in younger patients. The risk-allele score showed the same distribution in the general population (HapMap CEU samples) as in our control individuals and was associated with differential gene expression patterns of two genes (VAPA and TDRKH). In conclusion, we describe haplotypes and a genetic score that are significantly associated with a predisposition to develop CML. The SNPs identified will also serve to drive fundamental research on the putative role of these genes in CML development.
Collapse
|
47
|
Cáceres A, Esko T, Pappa I, Gutiérrez A, Lopez-Espinosa MJ, Llop S, Bustamante M, Tiemeier H, Metspalu A, Joshi PK, Wilsonx JF, Reina-Castillón J, Shin J, Pausova Z, Paus T, Sunyer J, Pérez-Jurado LA, González JR. Ancient Haplotypes at the 15q24.2 Microdeletion Region Are Linked to Brain Expression of MAN2C1 and Children's Intelligence. PLoS One 2016; 11:e0157739. [PMID: 27355585 PMCID: PMC4927142 DOI: 10.1371/journal.pone.0157739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 06/05/2016] [Indexed: 11/26/2022] Open
Abstract
The chromosome bands 15q24.1-15q24.3 contain a complex region with numerous segmental duplications that predispose to regional microduplications and microdeletions, both of which have been linked to intellectual disability, speech delay and autistic features. The region may also harbour common inversion polymorphisms whose functional and phenotypic manifestations are unknown. Using single nucleotide polymorphism (SNP) data, we detected four large contiguous haplotype-genotypes at 15q24 with Mendelian inheritance in 2,562 trios, African origin, high population stratification and reduced recombination rates. Although the haplotype-genotypes have been most likely generated by decreased or absent recombination among them, we could not confirm that they were the product of inversion polymorphisms in the region. One of the blocks was composed of three haplotype-genotypes (N1a, N1b and N2), which significantly correlated with intelligence quotient (IQ) in 2,735 children of European ancestry from three independent population cohorts. Homozygosity for N2 was associated with lower verbal IQ (2.4-point loss, p-value = 0.01), while homozygosity for N1b was associated with 3.2-point loss in non-verbal IQ (p-value = 0.0006). The three alleles strongly correlated with expression levels of MAN2C1 and SNUPN in blood and brain. Homozygosity for N2 correlated with over-expression of MAN2C1 over many brain areas but the occipital cortex where N1b homozygous highly under-expressed. Our population-based analyses suggest that MAN2C1 may contribute to the verbal difficulties observed in microduplications and to the intellectual disability of microdeletion syndromes, whose characteristic dosage increment and removal may affect different brain areas.
Collapse
|
48
|
Cáceres A, Vargas JE, González JR. APOE
and
MS4A6A
interact with GnRH signaling in Alzheimer's disease: Enrichment of epistatic effects. Alzheimers Dement 2016; 13:493-497. [DOI: 10.1016/j.jalz.2016.05.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 05/04/2016] [Accepted: 05/22/2016] [Indexed: 10/21/2022]
|
49
|
Artigas MS, Wain LV, Miller S, Kheirallah AK, Huffman JE, Ntalla I, Shrine N, Obeidat M, Trochet H, McArdle WL, Alves AC, Hui J, Zhao JH, Joshi PK, Teumer A, Albrecht E, Imboden M, Rawal R, Lopez LM, Marten J, Enroth S, Surakka I, Polasek O, Lyytikäinen LP, Granell R, Hysi PG, Flexeder C, Mahajan A, Beilby J, Bossé Y, Brandsma CA, Campbell H, Gieger C, Gläser S, González JR, Grallert H, Hammond CJ, Harris SE, Hartikainen AL, Heliövaara M, Henderson J, Hocking L, Horikoshi M, Hutri-Kähönen N, Ingelsson E, Johansson Å, Kemp JP, Kolcic I, Kumar A, Lind L, Melén E, Musk AW, Navarro P, Nickle DC, Padmanabhan S, Raitakari OT, Ried JS, Ripatti S, Schulz H, Scott RA, Sin DD, Starr JM, Viñuela A, Völzke H, Wild SH, Wright AF, Zemunik T, Jarvis DL, Spector TD, Evans DM, Lehtimäki T, Vitart V, Kähönen M, Gyllensten U, Rudan I, Deary IJ, Karrasch S, Probst-Hensch NM, Heinrich J, Stubbe B, Wilson JF, Wareham NJ, James AL, Morris AP, Jarvelin MR, Hayward C, Sayers I, Strachan DP, Hall IP, Tobin MD. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation. Nat Commun 2015; 6:8658. [PMID: 26635082 PMCID: PMC4686825 DOI: 10.1038/ncomms9658] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 09/17/2015] [Indexed: 01/11/2023] Open
Abstract
Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered.
Collapse
|
50
|
Hernandez-Ferrer C, Quintela Garcia I, Danielski K, Carracedo Á, Pérez-Jurado LA, González JR. affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling. BMC Bioinformatics 2015; 16:167. [PMID: 25991004 PMCID: PMC4438530 DOI: 10.1186/s12859-015-0608-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 04/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. Results We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Conclusion Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0608-y) contains supplementary material, which is available to authorized users.
Collapse
|