1
|
Benjamin KJM, Chen Q, Jaffe AE, Stolz JM, Collado-Torres L, Huuki-Myers LA, Burke EE, Arora R, Feltrin AS, Barbosa AR, Radulescu E, Pergola G, Shin JH, Ulrich WS, Deep-Soboslay A, Tao R, Hyde TM, Kleinman JE, Erwin JA, Weinberger DR, Paquola ACM. Analysis of the caudate nucleus transcriptome in individuals with schizophrenia highlights effects of antipsychotics and new risk genes. Nat Neurosci 2022; 25:1559-1568. [PMID: 36319771 PMCID: PMC10599288 DOI: 10.1038/s41593-022-01182-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 09/13/2022] [Indexed: 11/06/2022]
Abstract
Most studies of gene expression in the brains of individuals with schizophrenia have focused on cortical regions, but subcortical nuclei such as the striatum are prominently implicated in the disease, and current antipsychotic drugs target the striatum's dense dopaminergic innervation. Here, we performed a comprehensive analysis of the genetic and transcriptional landscape of schizophrenia in the postmortem caudate nucleus of the striatum of 443 individuals (245 neurotypical individuals, 154 individuals with schizophrenia and 44 individuals with bipolar disorder), 210 from African and 233 from European ancestries. Integrating expression quantitative trait loci analysis, Mendelian randomization with the latest schizophrenia genome-wide association study, transcriptome-wide association study and differential expression analysis, we identified many genes associated with schizophrenia risk, including potentially the dopamine D2 receptor short isoform. We found that antipsychotic medication has an extensive influence on caudate gene expression. We constructed caudate nucleus gene expression networks that highlight interactions involving schizophrenia risk. These analyses provide a resource for the study of schizophrenia and insights into risk mechanisms and potential therapeutic targets.
Collapse
Affiliation(s)
- Kynon J M Benjamin
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Qiang Chen
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Neumora Therapeutics, Watertown, MA, USA
| | - Joshua M Stolz
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | - Emily E Burke
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Ria Arora
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Arthur S Feltrin
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Center for Mathematics, Computation and Cognition, Federal University of ABC, Santo André, Brazil
| | - André Rocha Barbosa
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Inter-Institutional Graduate Program on Bioinformatics, University of São Paulo, São Paulo, Brazil
- Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
| | | | - Giulio Pergola
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | | | - Ran Tao
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jennifer A Erwin
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Apuã C M Paquola
- Lieber Institute for Brain Development, Baltimore, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Zhang Y, Li J, Feng D, Peng X, Wang B, Han T, Zhang Y. Systematic Analysis of Molecular Characterization and Clinical Relevance of Liquid–Liquid Phase Separation Regulators in Digestive System Neoplasms. Front Cell Dev Biol 2022; 9:820174. [PMID: 35252219 PMCID: PMC8891544 DOI: 10.3389/fcell.2021.820174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 12/21/2021] [Indexed: 01/02/2023] Open
Abstract
Background: The role of liquid–liquid phase separation (LLPS) in cancer has also attracted more and more attention, which is found to affect transcriptional regulation, maintaining genomic stability and signal transduction, and contribute to the occurrence and progression of tumors. However, the role of LLPS in digestive system tumors is still largely unknown. Results: Here, we characterized the expression profiles of LLPS regulators in 3 digestive tract tumor types such as COAD, STAD, and ESCA with The Cancer Genome Atlas (TCGA) data. Our results for the first time showed that LLPS regulatory factors, such as Brd4, FBN1, and TP53, were frequently mutated in all types of digestive system tumors. Variant allele frequency (VAF) and APOBEC analysis demonstrated that genetic alterations of LLPS regulators were related to the progression of digestive system neoplasms (DSNs), such as TP53, NPHS1, TNRC6B, ITSN1, TNPO1, PML, AR, BRD4, DLG4, and PTPN1. KM plotter analysis showed that the mutation status of LLPS regulators was significantly related to the overall survival (OS) time of DSNs, indicating that they may contribute to the progression of DSN. The expression analysis of LLPS regulatory factors showed that a variety of LLPS regulatory factors were significantly dysregulated in digestive system tumors, such as SYN2 and MAPT. It is worth noting that we first found that LLPS regulatory factors were significantly correlated with tumor immune infiltration of B cells, CD4+ T cells, and CD8+ T cells in digestive system tumors. Bioinformatics analysis showed that the LLPS regulators’ expression was closely related to multiple signaling, including the ErbB signaling pathway and T-cell receptor signaling pathway. Finally, several LLPS signatures were constructed and had a strong prognostic stratification ability in different digestive gland tumors. Finally, the results demonstrated the LLPS regulators’ signature score was significantly positively related to the infiltration levels of CD4+ T cells, neutrophil cells, macrophage cells, and CD8+ T cells. Conclusion: Our study for the first time showed the potential roles of LLPS regulators in carcinogenesis and provide novel insights to identify novel biomarkers for the prediction of immune therapy and prognosis of DSNs.
Collapse
Affiliation(s)
- Yaxin Zhang
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Jie Li
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Dan Feng
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Xiaobo Peng
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Bin Wang
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Bin Wang, ; Ting Han, ; Yingyi Zhang,
| | - Ting Han
- Departments of General Surgery, Changhai Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Bin Wang, ; Ting Han, ; Yingyi Zhang,
| | - Yingyi Zhang
- Department of Oncology, Changhai Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Bin Wang, ; Ting Han, ; Yingyi Zhang,
| |
Collapse
|
3
|
Scherer M, Gasparoni G, Rahmouni S, Shashkova T, Arnoux M, Louis E, Nostaeva A, Avalos D, Dermitzakis ET, Aulchenko YS, Lengauer T, Lyons PA, Georges M, Walter J. Identification of tissue-specific and common methylation quantitative trait loci in healthy individuals using MAGAR. Epigenetics Chromatin 2021; 14:44. [PMID: 34530905 PMCID: PMC8444396 DOI: 10.1186/s13072-021-00415-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 08/02/2021] [Indexed: 12/18/2022] Open
Abstract
Background Understanding the influence of genetic variants on DNA methylation is fundamental for the interpretation of epigenomic data in the context of disease. There is a need for systematic approaches not only for determining methylation quantitative trait loci (methQTL), but also for discriminating general from cell type-specific effects. Results Here, we present a two-step computational framework MAGAR (https://bioconductor.org/packages/MAGAR), which fully supports the identification of methQTLs from matched genotyping and DNA methylation data, and additionally allows for illuminating cell type-specific methQTL effects. In a pilot analysis, we apply MAGAR on data in four tissues (ileum, rectum, T cells, B cells) from healthy individuals and demonstrate the discrimination of common from cell type-specific methQTLs. We experimentally validate both types of methQTLs in an independent data set comprising additional cell types and tissues. Finally, we validate selected methQTLs located in the PON1, ZNF155, and NRG2 genes by ultra-deep local sequencing. In line with previous reports, we find cell type-specific methQTLs to be preferentially located in enhancer elements. Conclusions Our analysis demonstrates that a systematic analysis of methQTLs provides important new insights on the influences of genetic variants to cell type-specific epigenomic variation. Supplementary Information The online version contains supplementary material available at 10.1186/s13072-021-00415-6.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany.,Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany.,Department of Bioinformatics and Genomics, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Gilles Gasparoni
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
| | - Souad Rahmouni
- Unit of Animal Genomics, GIGA-Institute & Faculty of Veterinary Medicine, University of Liège, Liège, Belgium
| | - Tatiana Shashkova
- Kurchatov Genomics Center of the Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Research and Training Center on Bioinformatics, A.A. Kharkevich Institute for Information Transmission Problems, Moscow, Russia
| | - Marion Arnoux
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
| | - Edouard Louis
- Department of Gastroenterology, Liège University Hospital, CHU Liège, Liège, Belgium
| | | | - Diana Avalos
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics (SIB), University of Geneva, Geneva, Switzerland.,Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics (SIB), University of Geneva, Geneva, Switzerland.,Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland
| | - Yurii S Aulchenko
- Kurchatov Genomics Center of the Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia.,Moscow Institute of Physics and Technology (State University), Moscow, Russia.,PolyKnomics BV, 's-Hertogenbosch, The Netherlands
| | - Thomas Lengauer
- Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Paul A Lyons
- Department of Medicine, University of Cambridge School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.,Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge, CB2 0AW, UK
| | - Michel Georges
- Unit of Animal Genomics, GIGA-Institute & Faculty of Veterinary Medicine, University of Liège, Liège, Belgium
| | - Jörn Walter
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany.
| |
Collapse
|
4
|
Vogel I, Blanshard RC, Hoffmann ER. SureTypeSC-a Random Forest and Gaussian mixture predictor of high confidence genotypes in single-cell data. Bioinformatics 2020; 35:5055-5062. [PMID: 31116387 DOI: 10.1093/bioinformatics/btz412] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 04/08/2019] [Accepted: 05/21/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Accurate genotyping of DNA from a single cell is required for applications such as de novo mutation detection, linkage analysis and lineage tracing. However, achieving high precision genotyping in the single-cell environment is challenging due to the errors caused by whole-genome amplification. Two factors make genotyping from single cells using single nucleotide polymorphism (SNP) arrays challenging. The lack of a comprehensive single-cell dataset with a reference genotype and the absence of genotyping tools specifically designed to detect noise from the whole-genome amplification step. Algorithms designed for bulk DNA genotyping cause significant data loss when used for single-cell applications. RESULTS In this study, we have created a resource of 28.7 million SNPs, typed at high confidence from whole-genome amplified DNA from single cells using the Illumina SNP bead array technology. The resource is generated from 104 single cells from two cell lines that are available from the Coriell repository. We used mother-father-proband (trio) information from multiple technical replicates of bulk DNA to establish a high quality reference genotype for the two cell lines on the SNP array. This enabled us to develop SureTypeSC-a two-stage machine learning algorithm that filters a substantial part of the noise, thereby retaining the majority of the high quality SNPs. SureTypeSC also provides a simple statistical output to show the confidence of a particular single-cell genotype using Bayesian statistics. AVAILABILITY AND IMPLEMENTATION The implementation of SureTypeSC in Python and sample data are available in the GitHub repository: https://github.com/puko818/SureTypeSC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ivan Vogel
- DNRF Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark.,Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Robert C Blanshard
- Illumina Cambridge Ltd., Fulbourn, UK.,Genome Damage and Stability Centre, School of Life Sciences, University of Sussex, Brighton, UK
| | - Eva R Hoffmann
- DNRF Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark.,Genome Damage and Stability Centre, School of Life Sciences, University of Sussex, Brighton, UK
| |
Collapse
|
5
|
Feliciano P, Zhou X, Astrovskaya I, Turner TN, Wang T, Brueggeman L, Barnard R, Hsieh A, Snyder LG, Muzny DM, Sabo A, Gibbs RA, Eichler EE, O’Roak BJ, Michaelson JJ, Volfovsky N, Shen Y, Chung WK. Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. NPJ Genom Med 2019; 4:19. [PMID: 31452935 PMCID: PMC6707204 DOI: 10.1038/s41525-019-0093-8] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/11/2019] [Indexed: 12/30/2022] Open
Abstract
Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. We conducted a pilot study for SPARK (SPARKForAutism.org) of 457 families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. We identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings. In addition, we identified variants that are possibly associated with ASD in an additional 3.4% of families. A meta-analysis using the TADA framework at a false discovery rate (FDR) of 0.1 provides statistical support for 26 ASD risk genes. While most of these genes are already known ASD risk genes, BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD (p-value = 2.3e-06). Future studies leveraging the thousands of individuals with ASD who have enrolled in SPARK are likely to further clarify the genetic risk factors associated with ASD as well as allow accelerate ASD research that incorporates genetic etiology.
Collapse
Affiliation(s)
| | - Xueya Zhou
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Tychele N. Turner
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | - Rebecca Barnard
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Alexander Hsieh
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195 USA
| | - Brian J. O’Roak
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | | | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | - Wendy K. Chung
- Simons Foundation, New York, NY 10010 USA
- Department of Pediatrics, Columbia University Medical Center, New York, NY 10032 USA
| |
Collapse
|
6
|
Muñoz Garcia A, Kutmon M, Eijssen L, Hewison M, Evelo CT, Coort SL. Pathway analysis of transcriptomic data shows immunometabolic effects of vitamin D. J Mol Endocrinol 2018; 60:95-108. [PMID: 29233860 PMCID: PMC5850959 DOI: 10.1530/jme-17-0186] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 12/11/2017] [Indexed: 12/27/2022]
Abstract
Unbiased genomic screening analyses have highlighted novel immunomodulatory properties of the active form of vitamin D, 1,25-dihydroxyvitamin D (1,25(OH)2D). However, clearer interpretation of the resulting gene expression data is limited by cell model specificity. The aim of the current study was to provide a broader perspective on common gene regulatory pathways associated with innate immune responses to 1,25(OH)2D, through systematic re-interrogation of existing gene expression databases from multiple related monocyte models (the THP-1 monocytic cell line (THP-1), monocyte-derived dendritic cells (DCs) and monocytes). Vitamin D receptor (VDR) expression is common to multiple immune cell types, and thus, pathway analysis of gene expression using data from multiple related models provides an inclusive perspective on the immunomodulatory impact of vitamin D. A bioinformatic workflow incorporating pathway analysis using PathVisio and WikiPathways was utilized to compare each set of gene expression data based on pathway-level context. Using this strategy, pathways related to the TCA cycle, oxidative phosphorylation and ATP synthesis and metabolism were shown to be significantly regulated by 1,25(OH)2D in each of the repository models (Z-scores 3.52-8.22). Common regulation by 1,25(OH)2D was also observed for pathways associated with apoptosis and the regulation of apoptosis (Z-scores 2.49-3.81). In contrast to the primary culture DC and monocyte models, the THP-1 myelomonocytic cell line showed strong regulation of pathways associated with cell proliferation and DNA replication (Z-scores 6.1-12.6). In short, data presented here support a fundamental role for active 1,25(OH)2D as a pivotal regulator of immunometabolism.
Collapse
Affiliation(s)
- Amadeo Muñoz Garcia
- Department of Bioinformatics - BiGCaTNUTRIM School of Nutrition and Metabolism in Translational Research, Maastricht University, Maastricht, The Netherlands
- Institute of Metabolism and Systems ResearchThe University of Birmingham, Birmingham, UK
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaTNUTRIM School of Nutrition and Metabolism in Translational Research, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for System Biology (MaCSBio)Maastricht University, Maastricht, The Netherlands
| | - Lars Eijssen
- Department of Bioinformatics - BiGCaTNUTRIM School of Nutrition and Metabolism in Translational Research, Maastricht University, Maastricht, The Netherlands
| | - Martin Hewison
- Institute of Metabolism and Systems ResearchThe University of Birmingham, Birmingham, UK
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaTNUTRIM School of Nutrition and Metabolism in Translational Research, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for System Biology (MaCSBio)Maastricht University, Maastricht, The Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaTNUTRIM School of Nutrition and Metabolism in Translational Research, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
7
|
Mason AS, Higgins EE, Snowdon RJ, Batley J, Stein A, Werner C, Parkin IAP. A user guide to the Brassica 60K Illumina Infinium™ SNP genotyping array. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:621-633. [PMID: 28220206 DOI: 10.1007/s00122-016-2849-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 06/06/2023]
Abstract
The Brassica napus 60K Illumina Infinium™ SNP array has had huge international uptake in the rapeseed community due to the revolutionary speed of acquisition and ease of analysis of this high-throughput genotyping data, particularly when coupled with the newly available reference genome sequence. However, further utilization of this valuable resource can be optimized by better understanding the promises and pitfalls of SNP arrays. We outline how best to analyze Brassica SNP marker array data for diverse applications, including linkage and association mapping, genetic diversity and genomic introgression studies. We present data on which SNPs are locus-specific in winter, semi-winter and spring B. napus germplasm pools, rather than amplifying both an A-genome and a C-genome locus or multiple loci. Common issues that arise when analyzing array data will be discussed, particularly those unique to SNP markers and how to deal with these for practical applications in Brassica breeding applications.
Collapse
Affiliation(s)
- Annaliese S Mason
- Department of Plant Breeding, IFZ for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany.
| | - Erin E Higgins
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N0X2, Canada
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany
| | - Jacqueline Batley
- School of Agriculture and Food Sciences and Centre for Integrative Legume Research, The University of Queensland, Brisbane, 4072, Australia
- School of Plant Biology and The UWA Institute of Agriculture, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Perth, Australia
| | - Anna Stein
- Department of Plant Breeding, IFZ for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany
| | - Christian Werner
- Department of Plant Breeding, IFZ for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N0X2, Canada
| |
Collapse
|
8
|
Paradis E, Gosselin T, Goudet J, Jombart T, Schliep K. Linking genomics and population genetics with R. Mol Ecol Resour 2016; 17:54-66. [PMID: 27461508 DOI: 10.1111/1755-0998.12577] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Revised: 07/01/2016] [Accepted: 07/19/2016] [Indexed: 11/29/2022]
Abstract
Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.
Collapse
Affiliation(s)
- Emmanuel Paradis
- Institut des Sciences de l'Évolution, Université Montpellier - CNRS - IRD - EPHE, Place Eugène Bataillon - CC 065, 34095, Montpellier cédex 05, France
| | - Thierry Gosselin
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, G1V 0A6, Canada
| | - Jérôme Goudet
- Department of Ecology and Evolution, Swiss Institute of Bioinformatics, Lausanne, CH-1015, Switzerland
| | - Thibaut Jombart
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College, London, W2 1PG, UK
| | - Klaus Schliep
- Department of Biology, University of Massachusetts Boston, Boston, MA, 02125, USA
| |
Collapse
|
9
|
Li G. A new model calling procedure for Illumina BeadArray data. BMC Genet 2016; 17:90. [PMID: 27343118 PMCID: PMC4921002 DOI: 10.1186/s12863-016-0398-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 06/16/2016] [Indexed: 11/10/2022] Open
Abstract
Background Accurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies. Many popular calling algorithms use mixture models to infer genotypes of a large number of single nucleotide polymorphisms in a fast and efficient way. In practice, mixture models are mostly restricted to infer genotypes for common SNPs where their minor allele frequencies are quite large. However, it is still challenging to accurately genotype rare variants, especially for some rare variants where the boundaries of their genotypes are not clearly defined. Results To further improve the call accuracy and the quality of genotypes on rare variants, a new model calling procedure, named M-D, is proposed to infer genotypes for the Illumina BeadArray data. In this calling procedure, a Gaussian Mixture Model and a Dirichlet Process Gaussian Mixture Model are integrated to infer genotypes. Conclusions Applications to Illumina data illustrate that this new approach can improve calling performance compared to other popular genotyping algorithms.
Collapse
Affiliation(s)
- Gengxin Li
- Department of Mathematics and Statistics, Wright State University, 3640 Colonel Glenn Hwy, Dayton, 45435, USA.
| |
Collapse
|
10
|
Turner T, Hormozdiari F, Duyzend M, McClymont S, Hook P, Iossifov I, Raja A, Baker C, Hoekzema K, Stessman H, Zody M, Nelson B, Huddleston J, Sandstrom R, Smith J, Hanna D, Swanson J, Faustman E, Bamshad M, Stamatoyannopoulos J, Nickerson D, McCallion A, Darnell R, Eichler E. Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. Am J Hum Genet 2016; 98:58-74. [PMID: 26749308 DOI: 10.1016/j.ajhg.2015.11.023] [Citation(s) in RCA: 189] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 11/25/2015] [Indexed: 12/17/2022] Open
Abstract
We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
Collapse
|
11
|
Abstract
Genotyping microarrays are an important and widely-used tool in genetics. I present argyle, an R package for analysis of genotyping array data tailored to Illumina arrays. The goal of the argyle package is to provide simple, expressive tools for nonexpert users to perform quality checks and exploratory analyses of genotyping data. To these ends, the package consists of a suite of quality-control functions, normalization procedures, and utilities for visually and statistically summarizing such data. Format-conversion tools allow interoperability with popular software packages for analysis of genetic data including PLINK, R/qtl and DOQTL. Detailed vignettes demonstrating common use cases are included as supporting information. argyle bridges the gap between the low-level tasks of quality control and high-level tasks of genetic analysis. It is freely available at https://github.com/andrewparkermorgan/argyle and has been submitted to Bioconductor.
Collapse
|
12
|
M(3)-S: a genotype calling method incorporating information from samples with known genotypes. BMC Bioinformatics 2015; 16:403. [PMID: 26634345 PMCID: PMC4669649 DOI: 10.1186/s12859-015-0824-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 11/02/2015] [Indexed: 11/21/2022] Open
Abstract
Background A key challenge in analyzing high throughput Single Nucleotide Polymorphism (SNP) arrays is the accurate inference of genotypes for SNPs with low minor allele frequencies. A number of calling algorithms have been developed to infer genotypes for common SNPs, but they are limited in their performance in calling rare SNPs. The existing algorithms can be broadly classified into three categories, including: population-based methods, SNP-based methods, and a hybrid of the two approaches. Despite the relatively better performance of the hybrid approach, it is still challenging to analyze rare SNPs. Results We propose to utilize information from samples with known genotypes to develop a two stage genotyping procedure, namely M3-S, for rare SNP calling. This new approach can improve genotyping accuracy through clearly defining the boundaries of genotype clusters from samples with known genotypes, and enlarge the call rate by combining the simulated data based on the inferred genotype clusters information with the study population. Conclusions Applications to real data demonstrates that this new approach M3-S outperforms existing methods in calling rare SNPs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0824-5) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Glidewell SC, Miyamoto SD, Grossfeld PD, Clouthier DE, Coldren CD, Stearman RS, Geraci MW. Transcriptional Impact of Rare and Private Copy Number Variants in Hypoplastic Left Heart Syndrome. Clin Transl Sci 2015; 8:682-9. [PMID: 26534787 DOI: 10.1111/cts.12340] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Hypoplastic left heart syndrome (HLHS) is a heterogeneous, lethal combination of congenital malformations characterized by severe underdevelopment of left heart structures, resulting in a univentricular circulation. The genetic determinants of this disorder are largely unknown. Evidence of copy number variants (CNVs) contributing to the genetic etiology of HLHS and other congenital heart defects has been mounting. However, the functional effects of such CNVs have not been examined, particularly in cases where the variant of interest is found in only a single patient. METHODS AND RESULTS Whole-genome SNP microarrays were employed to detect CNVs in two patient cohorts (N = 70 total) predominantly diagnosed with some form of nonsyndromic HLHS. We discovered 16 rare or private variants adjacent to or overlapping 20 genes associated with cardiovascular or premature lethality phenotypes in mouse knockout models. We evaluated the impact of selected variants on the expression of nine of these genes through quantitative PCR on cDNA derived from patient heart tissue. Four genes displayed significantly altered expression in patients with an overlapping or proximal CNV verses patients without such CNVs. CONCLUSION Rare and private genomic imbalances perturb transcription of genes that potentially affect cardiogenesis in a subset of nonsyndromic HLHS patients.
Collapse
Affiliation(s)
- Steven C Glidewell
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.,Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Shelley D Miyamoto
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.,Department of Pediatrics, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Paul D Grossfeld
- Department of Pediatrics, University of California, San Diego, California, USA
| | - David E Clouthier
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.,Department of Craniofacial Biology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | | | - Robert S Stearman
- Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Mark W Geraci
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.,Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
14
|
Davis B, Shen Y, Poon CC, Luchman HA, Stechishin OD, Pontifex CS, Wu W, Kelly JJ, Blough MD. Comparative genomic and genetic analysis of glioblastoma-derived brain tumor-initiating cells and their parent tumors. Neuro Oncol 2015; 18:350-60. [PMID: 26245525 DOI: 10.1093/neuonc/nov143] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 06/24/2015] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Glioblastoma (GBM) is a fatal cancer that has eluded major therapeutic advances. Failure to make progress may reflect the absence of a human GBM model that could be used to test compounds for anti-GBM activity. In this respect, the development of brain tumor-initiating cell (BTIC) cultures is a step forward because BTICs appear to capture the molecular diversity of GBM better than traditional glioma cell lines. Here, we perform a comparative genomic and genetic analysis of BTICs and their parent tumors as preliminary evaluation of the BTIC model. METHODS We assessed single nucleotide polymorphisms (SNPs), genome-wide copy number variations (CNVs), gene expression patterns, and molecular subtypes of 11 established BTIC lines and matched parent tumors. RESULTS Although CNV differences were noted, BTICs retained the major genomic alterations characteristic of GBM. SNP patterns were similar between BTICs and tumors. Importantly, recurring SNP or CNV alterations specific to BTICs were not seen. Comparative gene expression analysis and molecular subtyping revealed differences between BTICs and GBMs. These differences formed the basis of a 63-gene expression signature that distinguished cells from tumors; differentially expressed genes primarily involved metabolic processes. We also derived a set of 73 similarly expressed genes; these genes were not associated with specific biological functions. CONCLUSIONS Although not identical, established BTIC lines preserve the core molecular alterations seen in their parent tumors, as well as the genomic hallmarks of GBM, without acquiring recurring BTIC-specific changes.
Collapse
Affiliation(s)
- Brad Davis
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Yaoqing Shen
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Candice C Poon
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - H Artee Luchman
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Owen D Stechishin
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Carly S Pontifex
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Wei Wu
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - John J Kelly
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | - Michael D Blough
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada (B.D., Y. S.); Department of Clinical Neurosciences, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., J.J.K.); Clark Smith Brain Tumour Research Centre, Southern Alberta Cancer Research Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (C.C.P., C.S.P., W.W., J.J.K., M.D.B.); Hotchkiss Brain Institute, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada (H.A.L., O.D.S.)
| | | |
Collapse
|
15
|
Hernandez-Ferrer C, Quintela Garcia I, Danielski K, Carracedo Á, Pérez-Jurado LA, González JR. affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling. BMC Bioinformatics 2015; 16:167. [PMID: 25991004 PMCID: PMC4438530 DOI: 10.1186/s12859-015-0608-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 04/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. Results We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Conclusion Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0608-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Carles Hernandez-Ferrer
- Center for Research in Environmental Epidemiology (CREAL), Doctor Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain.
| | - Ines Quintela Garcia
- Grupo de Medicina Xenómica - Universidade de Santiago de Compostela, Santiago de Compostela, Spain. .,Centro Nacional de Genotipado - Instituto Carlos III, Santiago de Compostela, Spain.
| | | | - Ángel Carracedo
- Grupo de Medicina Xenómica - Universidade de Santiago de Compostela, Santiago de Compostela, Spain. .,CIBER Enfermedades Raras (CIBERER), Madrid, Spain. .,Fundación Pública Galega de Medicina Xenómica (SERGAS), Santiago de Compostela, Spain. .,King Abdulaziz University, Center of Excellence in Genomic Medicine Research, Jeddah, Saudi Arabia.
| | - Luis A Pérez-Jurado
- CIBER Enfermedades Raras (CIBERER), Madrid, Spain. .,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain.
| | - Juan R González
- Center for Research in Environmental Epidemiology (CREAL), Doctor Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain.
| |
Collapse
|
16
|
Excess of rare, inherited truncating mutations in autism. Nat Genet 2015; 47:582-8. [PMID: 25961944 PMCID: PMC4449286 DOI: 10.1038/ng.3303] [Citation(s) in RCA: 393] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 04/20/2015] [Indexed: 12/15/2022]
Abstract
To assess the relative impact of inherited and de novo variants on autism risk, we generated a comprehensive set of exonic single nucleotide variants (SNVs) and copy number variants (CNVs) from 2,377 autism families. We find that private, inherited truncating SNVs in conserved genes are enriched in probands (odds ratio=1.14, p=0.0002) compared to unaffected siblings, an effect with significant maternal transmission bias to sons. We also observe a bias for inherited CNVs, specifically for small (<100 kbp), maternally inherited events (p=0.01) that are enriched in CHD8 target genes (p=7.4×10−3). Using a logistic regression model, we show that private truncating SNVs and rare, inherited CNVs are statistically independent autism risk factors, with odds ratios of 1.11 (p=0.0002) and 1.23 (p=0.01), respectively. This analysis identifies a second class of candidate genes (e.g., RIMS1, CUL7, and LZTR1) where transmitted mutations may create a sensitized background but are unlikely to be completely penetrant.
Collapse
|
17
|
Chagné D, Bianco L, Lawley C, Micheletti D, Jacobs JME. Methods for the design, implementation, and analysis of illumina infinium™ SNP assays in plants. Methods Mol Biol 2015; 1245:281-98. [PMID: 25373765 DOI: 10.1007/978-1-4939-1966-6_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The advent of Next-Generation sequencing-by-synthesis technologies has fuelled SNP discovery, genotyping, and screening of populations in myriad ways for many species, including various plant species. One technique widely applied to screening a large number of SNP markers over a large number of samples is the Illumina Infinium™ assay.
Collapse
Affiliation(s)
- David Chagné
- The New Zealand Institute for Plant & Food Research Limited, Palmerston North Research Centre, Private Bag 11600, Palmerston North, 4442, New Zealand,
| | | | | | | | | |
Collapse
|
18
|
Endometriosis is associated with rare copy number variants. PLoS One 2014; 9:e103968. [PMID: 25083881 PMCID: PMC4118997 DOI: 10.1371/journal.pone.0103968] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 07/09/2014] [Indexed: 12/22/2022] Open
Abstract
Endometriosis is a complex gynecological condition that affects 6-10% of women in their reproductive years and is defined by the presence of endometrial glands and stroma outside the uterus. Twin, family, and genome-wide association (GWA) studies have confirmed a genetic role, yet only a small part of the genetic risk can be explained by SNP variation. Copy number variants (CNVs) account for a greater portion of human genetic variation than SNPs and include more recent mutations of large effect. CNVs, likely to be prominent in conditions with decreased reproductive fitness, have not previously been examined as a genetic contributor to endometriosis. Here we employ a high-density genotyping microarray in a genome-wide survey of CNVs in a case-control population that includes 2,126 surgically confirmed endometriosis cases and 17,974 population controls of European ancestry. We apply stringent quality filters to reduce the false positive rate common to many CNV-detection algorithms from 77.7% to 7.3% without noticeable reduction in the true positive rate. We detected no differences in the CNV landscape between cases and controls on the global level which showed an average of 1.92 CNVs per individual with an average size of 142.3 kb. On the local level we identify 22 CNV-regions at the nominal significance threshold (P<0.05), which is greater than the 8.15 CNV-regions expected based on permutation analysis (P<0.001). Three CNV's passed a genome-wide P-value threshold of 9.3 × 10(-4); a deletion at SGCZ on 8p22 (P = 7.3 × 10(-4), OR = 8.5, Cl = 2.3-31.7), a deletion in MALRD1 on 10p12.31 (P = 5.6 × 10(-4), OR = 14.1, Cl = 2.7-90.9), and a deletion at 11q14.1 (P = 5.7 × 10(-4), OR = 33.8, Cl = 3.3-1651). Two SNPs within the 22 CNVRs show significant genotypic association with endometriosis after adjusting for multiple testing; rs758316 in DPP6 on 7q36.2 (P = 0.0045) and rs4837864 in ASTN2 on 9q33.1 (P = 0.0002). Together, the CNV-loci are detected in 6.9% of affected women compared to 2.1% in the general population.
Collapse
|
19
|
Liu R, Dai Z, Yeager M, Irizarry RA, Ritchie ME. KRLMM: an adaptive genotype calling method for common and low frequency variants. BMC Bioinformatics 2014; 15:158. [PMID: 24886250 PMCID: PMC4064501 DOI: 10.1186/1471-2105-15-158] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 05/19/2014] [Indexed: 11/10/2022] Open
Abstract
Background SNP genotyping microarrays have revolutionized the study of complex disease. The current range of commercially available genotyping products contain extensive catalogues of low frequency and rare variants. Existing SNP calling algorithms have difficulty dealing with these low frequency variants, as the underlying models rely on each genotype having a reasonable number of observations to ensure accurate clustering. Results Here we develop KRLMM, a new method for converting raw intensities into genotype calls that aims to overcome this issue. Our method is unique in that it applies careful between sample normalization and allows a variable number of clusters k (1, 2 or 3) for each SNP, where k is predicted using the available data. We compare our method to four genotyping algorithms (GenCall, GenoSNP, Illuminus and OptiCall) on several Illumina data sets that include samples from the HapMap project where the true genotypes are known in advance. All methods were found to have high overall accuracy (> 98%), with KRLMM consistently amongst the best. At low minor allele frequency, the KRLMM, OptiCall and GenoSNP algorithms were observed to be consistently more accurate than GenCall and Illuminus on our test data. Conclusions Methods that tailor their approach to calling low frequency variants by either varying the number of clusters (KRLMM) or using information from other SNPs (OptiCall and GenoSNP) offer improved accuracy over methods that do not (GenCall and Illuminus). The KRLMM algorithm is implemented in the open-source crlmm package distributed via the Bioconductor project (http://www.bioconductor.org).
Collapse
Affiliation(s)
| | | | | | - Rafael A Irizarry
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.
| | | |
Collapse
|
20
|
Smith ML, Baggerly KA, Bengtsson H, Ritchie ME, Hansen KD. illuminaio: An open source IDAT parsing tool for Illumina microarrays. F1000Res 2013; 2:264. [PMID: 24701342 PMCID: PMC3968891 DOI: 10.12688/f1000research.2-264.v1] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/04/2013] [Indexed: 11/21/2022] Open
Abstract
The IDAT file format is used to store BeadArray data from the myriad of genomewide profiling platforms on offer from Illumina Inc. This proprietary format is output directly from the scanner and stores summary intensities for each probe-type on an array in a compact manner. A lack of open source tools to process IDAT files has hampered their uptake by the research community beyond the standard step of using the vendor’s software to extract the data they contain in a human readable text format. To fill this void, we have developed the illuminaio package that parses IDAT files from any BeadArray platform, including the decryption of files from Illumina’s gene expression arrays. illuminaio provides the first open-source package for this task, and will promote wider uptake of the IDAT format as a standard for sharing Illumina BeadArray data in public databases, in the same way that the CEL file serves as the standard for the Affymetrix platform.
Collapse
Affiliation(s)
- Mike L Smith
- CRUK Cambridge Institute, Li Ka Shing Centre, The University of Cambridge, Cambridge, CB2 0RE, UK
| | - Keith A Baggerly
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Henrik Bengtsson
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94107, USA
| | - Matthew E Ritchie
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia ; Department of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3052, Australia
| | - Kasper D Hansen
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA ; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
21
|
Quigley DA, Fiorito E, Nord S, Van Loo P, Alnæs GG, Fleischer T, Tost J, Moen Vollan HK, Tramm T, Overgaard J, Bukholm IR, Hurtado A, Balmain A, Børresen-Dale AL, Kristensen V. The 5p12 breast cancer susceptibility locus affects MRPS30 expression in estrogen-receptor positive tumors. Mol Oncol 2013; 8:273-84. [PMID: 24388359 DOI: 10.1016/j.molonc.2013.11.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 11/19/2013] [Accepted: 11/21/2013] [Indexed: 11/17/2022] Open
Abstract
Genome-wide association studies have identified numerous loci linked to breast cancer susceptibility, but the mechanism by which variations at these loci influence susceptibility is usually unknown. Some variants are only associated with particular clinical subtypes of breast cancer. Understanding how and why these variants influence subtype-specific cancer risk contributes to our understanding of cancer etiology. We conducted a genome-wide expression Quantitative Trait Locus (eQTL) study in a discovery set of 287 breast tumors and 97 normal mammary tissue samples and a replication set of 235 breast tumors. We found that the risk-associated allele of rs7716600 in the 5p12 estrogen receptor-positive (ER-positive) susceptibility locus was associated with elevated expression of the nearby gene MRPS30 exclusively in ER-positive tumors. We replicated this finding in 235 independent tumors. Further, we showed the rs7716600 risk genotype was associated with decreased MRPS30 promoter methylation exclusively in ER-positive breast tumors. In vitro studies in MCF-7 cells carrying the protective genotype showed that estrogen stimulation decreased MRPS30 promoter chromatin availability and mRNA levels. In contrast, in 600MPE cells carrying the risk genotype, estrogen increased MRPS30 expression and did not affect promoter availability. Our data suggest the 5p12 risk allele affects MRPS30 expression in estrogen-responsive tumor cells after tumor initiation by a mechanism affecting chromatin availability. These studies emphasize that the genetic architecture of breast cancer is context-specific, and integrated analysis of gene expression and chromatin remodeling in normal and tumor tissues will be required to explain the mechanisms of risk alleles.
Collapse
Affiliation(s)
- David A Quigley
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway; Helen Diller Family Comprehensive Cancer Center, University of California at San Francisco, San Francisco, USA.
| | - Elisa Fiorito
- Breast Cancer Research Group, Nordic EMBL Partnership, Centre for Molecular Medicine Norway, (NCMM), University of Oslo, Norway.
| | - Silje Nord
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
| | - Peter Van Loo
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton UK; Department of Human Genetics, VIB and KU Leuven, Leuven, Belgium.
| | - Grethe Grenaker Alnæs
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
| | - Thomas Fleischer
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
| | - Jorg Tost
- Laboratory for Functional Genomics, Fondation Jean Dausset, Centre Etude Polymorphism Humain, (CEPH), Paris, France; Laboratory of Epigenetics, Centre National de Génotypage, Commissariat à l'énergie Atomique et, aux énergies Alternatives (CEA)-Institut de Génomique, Evry, France.
| | - Hans Kristian Moen Vollan
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
| | - Trine Tramm
- Department of Experimental Clinical Oncology, Aarhus University Hospital, Aarhus, Denmark.
| | - Jens Overgaard
- Department of Experimental Clinical Oncology, Aarhus University Hospital, Aarhus, Denmark.
| | - Ida R Bukholm
- Department of Breast-Endocrine Surgery, Akershus University Hospital, Oslo, Norway; Department of Oncology, Division of Cancer Medicine, Surgery and Transplantation, Oslo University Hospital, Oslo, Norway.
| | - Antoni Hurtado
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway; Breast Cancer Research Group, Nordic EMBL Partnership, Centre for Molecular Medicine Norway, (NCMM), University of Oslo, Norway.
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California at San Francisco, San Francisco, USA.
| | - Anne-Lise Børresen-Dale
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
| | - Vessela Kristensen
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway; Department of Clinical Molecular Biology (EpiGen), Medical Division, Akershus University Hospital, Lørenskog, Norway.
| |
Collapse
|
22
|
Lambert G, Tsinajinnie D, Duggan D. Single Nucleotide Polymorphism Genotyping Using BeadChip Microarrays. ACTA ACUST UNITED AC 2013; Chapter 2:Unit 2.9. [DOI: 10.1002/0471142905.hg0209s78] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Gilliam Lambert
- Genetic Basis of Human Disease Division, Translational Genomics Research Institute (TGen) Phoenix Arizona
| | - Darwin Tsinajinnie
- Genetic Basis of Human Disease Division, Translational Genomics Research Institute (TGen) Phoenix Arizona
| | - David Duggan
- Genetic Basis of Human Disease Division, Translational Genomics Research Institute (TGen) Phoenix Arizona
| |
Collapse
|
23
|
Triche TJ, Weisenberger DJ, Van Den Berg D, Laird PW, Siegmund KD. Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic Acids Res 2013; 41:e90. [PMID: 23476028 PMCID: PMC3627582 DOI: 10.1093/nar/gkt090] [Citation(s) in RCA: 527] [Impact Index Per Article: 47.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
We propose a novel approach to background correction for Infinium HumanMethylation data to account for technical variation in background fluorescence signal. Our approach capitalizes on a new use for the Infinium I design bead types to measure non-specific fluorescence in the colour channel opposite of their design (Cy3/Cy5). This provides tens of thousands of features for measuring background instead of the much smaller number of negative control probes on the platforms (n = 32 for HumanMethylation27 and n = 614 for HumanMethylation450, respectively). We compare the performance of our methods with existing approaches, using technical replicates of both mixture samples and biological samples, and demonstrate that within- and between-platform artefacts can be substantially reduced, with concomitant improvement in sensitivity, by the proposed methods.
Collapse
Affiliation(s)
- Timothy J Triche
- Department of Preventive Medicine, USC Keck School of Medicine of USC, Los Angeles, CA 90089, USA.
| | | | | | | | | |
Collapse
|
24
|
Ha G, Shah S. Distinguishing somatic and germline copy number events in cancer patient DNA hybridized to whole-genome SNP genotyping arrays. Methods Mol Biol 2013; 973:355-372. [PMID: 23412801 DOI: 10.1007/978-1-62703-281-0_22] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Chromosomal aneuploidy and segmental copy number changes are common genomic aberrations in -cancer. Copy number alterations (CNAs) arise from deletions, insertions, or duplications resulting in -chromosomal aberrations and aneuploidy. Genomes of normal cells also exhibit variable copy number called germline copy number variants (CNVs). CNVs in the general population tend to confound interpretation of predictions when attempting to extract relevant driver somatic events in cancer. In large studies of CNAs in cancer patients, it becomes necessary to accurately identify and separate CNAs and CNVs so as to prioritize candidate tumor suppressors and oncogenes. We have developed a probabilistic approach, HMM-Dosage, for segmenting and distinguishing CNAs and CNVs as separate, discrete events in cancer SNP genotyping array data. We outline the steps and computer code for the analysis of whole-genome cancer DNA hybridized to SNP genotyping arrays, focusing on distinguishing somatic CNA and germline CNVs, and describe the combined approach of HMM-Dosage for probabilistic inference and classification of somatic and germline copy number changes.
Collapse
Affiliation(s)
- Gavin Ha
- Molecular Oncology, BC Cancer Agency, Vancouver, BC, Canada.
| | | |
Collapse
|
25
|
Idbaih A, Ducray F, Dehais C, Courdy C, Carpentier C, de Bernard S, Uro-Coste E, Mokhtari K, Jouvet A, Honnorat J, Chinot O, Ramirez C, Beauchesne P, Benouaich-Amiel A, Godard J, Eimer S, Parker F, Lechapt-Zalcman E, Colin P, Loussouarn D, Faillot T, Dam-Hieu P, Elouadhani-Hamdi S, Bauchet L, Langlois O, Le Guerinel C, Fontaine D, Vauleon E, Menei P, Fotso MJM, Desenclos C, Verrelle P, Ghiringhelli F, Noel G, Labrousse F, Carpentier A, Dhermain F, Delattre JY, Figarella-Branger D. SNP array analysis reveals novel genomic abnormalities including copy neutral loss of heterozygosity in anaplastic oligodendrogliomas. PLoS One 2012; 7:e45950. [PMID: 23071531 PMCID: PMC3468603 DOI: 10.1371/journal.pone.0045950] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 08/23/2012] [Indexed: 12/16/2022] Open
Abstract
Anaplastic oligodendrogliomas (AOD) are rare glial tumors in adults with relative homogeneous clinical, radiological and histological features at the time of diagnosis but dramatically various clinical courses. Studies have identified several molecular abnormalities with clinical or biological relevance to AOD (e.g. t(1;19)(q10;p10), IDH1, IDH2, CIC and FUBP1 mutations). To better characterize the clinical and biological behavior of this tumor type, the creation of a national multicentric network, named “Prise en charge des OLigodendrogliomes Anaplasiques (POLA),” has been supported by the Institut National du Cancer (InCA). Newly diagnosed and centrally validated AOD patients and their related biological material (tumor and blood samples) were prospectively included in the POLA clinical database and tissue bank, respectively. At the molecular level, we have conducted a high-resolution single nucleotide polymorphism array analysis, which included 83 patients. Despite a careful central pathological review, AOD have been found to exhibit heterogeneous genomic features. A total of 82% of the tumors exhibited a 1p/19q-co-deletion, while 18% harbor a distinct chromosome pattern. Novel focal abnormalities, including homozygously deleted, amplified and disrupted regions, have been identified. Recurring copy neutral losses of heterozygosity (CNLOH) inducing the modulation of gene expression have also been discovered. CNLOH in the CDKN2A locus was associated with protein silencing in 1/3 of the cases. In addition, FUBP1 homozygous deletion was detected in one case suggesting a putative tumor suppressor role of FUBP1 in AOD. Our study showed that the genomic and pathological analyses of AOD are synergistic in detecting relevant clinical and biological subgroups of AOD.
Collapse
Affiliation(s)
- Ahmed Idbaih
- Université Pierre et Marie Curie-Paris 6, Centre de Recherche de l'Institut du Cerveau et de la Moelle Epinière (CRICM), UMRS 975, Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Valsesia A, Stevenson BJ, Waterworth D, Mooser V, Vollenweider P, Waeber G, Jongeneel CV, Beckmann JS, Kutalik Z, Bergmann S. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 2012; 13:241. [PMID: 22702538 PMCID: PMC3464625 DOI: 10.1186/1471-2164-13-241] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/15/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Collapse
Affiliation(s)
- Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Liu R, Maia AT, Russell R, Caldas C, Ponder BA, Ritchie ME. Allele-specific expression analysis methods for high-density SNP microarray data. Bioinformatics 2012; 28:1102-8. [PMID: 22355082 DOI: 10.1093/bioinformatics/bts089] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION In the past decade, a number of technologies to quantify allele-specific expression (ASE) in a genome-wide manner have become available to researchers. We investigate the application of single-nucleotide polymorphism (SNP) microarrays to this task, exploring data obtained from both cell lines and primary tissue for which both RNA and DNA profiles are available. RESULTS We analyze data from two experiments that make use of high-density Illumina Infinium II genotyping arrays to measure ASE. We first preprocess each data set, which involves removal of outlier samples, careful normalization and a two-step filtering procedure to remove SNPs that show no evidence of expression in the samples being analyzed and calls that are clear genotyping errors. We then compare three different tests for detecting ASE, one of which has been previously published and two novel approaches. These tests vary at the level at which they operate (per SNP per individual or per SNP) and in the input data they require. Using SNPs from imprinted genes as true positives for ASE, we observe varying sensitivity for the different testing procedures that improves with increasing sample size. Methods that rely on RNA signal alone were found to perform best across a range of metrics. The top ranked SNPs recovered by all methods appear to be reasonable candidates for ASE. AVAILABILITY AND IMPLEMENTATION Analysis was carried out in R (http://www.R-project.org/) using existing functions.
Collapse
Affiliation(s)
- Ruijie Liu
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | | | | | | | | | | |
Collapse
|
28
|
Li G, Gelernter J, Kranzler HR, Zhao H. M(3): an improved SNP calling algorithm for Illumina BeadArray data. ACTA ACUST UNITED AC 2011; 28:358-65. [PMID: 22155947 DOI: 10.1093/bioinformatics/btr673] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Genotype calling from high-throughput platforms such as Illumina and Affymetrix is a critical step in data processing, so that accurate information on genetic variants can be obtained for phenotype-genotype association studies. A number of algorithms have been developed to infer genotypes from data generated through the Illumina BeadStation platform, including GenCall, GenoSNP, Illuminus and CRLMM. Most of these algorithms are built on population-based statistical models to genotype every SNP in turn, such as GenCall with the GenTrain clustering algorithm, and require a large reference population to perform well. These approaches may not work well for rare variants where only a small proportion of the individuals carry the variant. A fundamentally different approach, implemented in GenoSNP, adopts a single nucleotide polymorphism (SNP)-based model to infer genotypes of all the SNPs in one individual, making it an appealing alternative to call rare variants. However, compared to the population-based strategies, more SNPs in GenoSNP may fail the Hardy-Weinberg Equilibrium test. To take advantage of both strategies, we propose a two-stage SNP calling procedure, named the modified mixture model (M(3)), to improve call accuracy for both common and rare variants. The effectiveness of our approach is demonstrated through applications to genotype calling on a set of HapMap samples used for quality control purpose in a large case-control study of cocaine dependence. The increase in power with M(3) is greater for rare variants than for common variants depending on the model. AVAILABILITY M(3) algorithm: http://bioinformatics.med.yale.edu/group. CONTACT name@bio.com; hongyu.zhao@yale.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gengxin Li
- Biostatistics Division, Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
29
|
Abstract
The focus of this review is software for the genotyping of microarray single nucleotide polymorphisms, in particular software for Affymetrix and Illumina arrays. Different statistical principles and ideas have been applied to the construction of genotyping algorithms -- for example, likelihood versus Bayesian modelling, and whether to genotype one or all arrays at a time. The release of new arrays is generally followed by new, or updated, algorithms.
Collapse
|
30
|
Halper-Stromberg E, Frelin L, Ruczinski I, Scharpf R, Jie C, Carvalho B, Hao H, Hetrick K, Jedlicka A, Dziedzic A, Doheny K, Scott AF, Baylin S, Pevsner J, Spencer F, Irizarry RA. Performance assessment of copy number microarray platforms using a spike-in experiment. ACTA ACUST UNITED AC 2011; 27:1052-60. [PMID: 21478196 DOI: 10.1093/bioinformatics/btr106] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
MOTIVATION Changes in the copy number of chromosomal DNA segments [copy number variants (CNVs)] have been implicated in human variation, heritable diseases and cancers. Microarray-based platforms are the current established technology of choice for studies reporting these discoveries and constitute the benchmark against which emergent sequence-based approaches will be evaluated. Research that depends on CNV analysis is rapidly increasing, and systematic platform assessments that distinguish strengths and weaknesses are needed to guide informed choice. RESULTS We evaluated the sensitivity and specificity of six platforms, provided by four leading vendors, using a spike-in experiment. NimbleGen and Agilent platforms outperformed Illumina and Affymetrix in accuracy and precision of copy number dosage estimates. However, Illumina and Affymetrix algorithms that leverage single nucleotide polymorphism (SNP) information make up for this disadvantage and perform well at variant detection. Overall, the NimbleGen 2.1M platform outperformed others, but only with the use of an alternative data analysis pipeline to the one offered by the manufacturer. AVAILABILITY The data is available from http://rafalab.jhsph.edu/cnvcomp/. CONTACT pevsner@jhmi.edu; fspencer@jhmi.edu; rafa@jhu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eitan Halper-Stromberg
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT, Kraus WL. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell 2011; 145:622-34. [PMID: 21549415 DOI: 10.1016/j.cell.2011.03.042] [Citation(s) in RCA: 368] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Revised: 02/07/2011] [Accepted: 03/24/2011] [Indexed: 01/13/2023]
Abstract
We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using global run-on and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein-coding genes, estrogen regulates the distribution and activity of all three RNA polymerases and virtually every class of noncoding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems.
Collapse
Affiliation(s)
- Nasun Hah
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | | | |
Collapse
|
32
|
Scharpf RB, Irizarry RA, Ritchie ME, Carvalho B, Ruczinski I. Using the R Package crlmm for Genotyping and Copy Number Estimation. J Stat Softw 2011; 40:1-32. [PMID: 22523482 PMCID: PMC3329223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023] Open
Abstract
Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for batch effects and provides allele-specific estimates of copy number. This paper illustrates a workflow for the estimation of allele-specific copy number and integration of the marker-level estimates with complimentary Bioconductor software for inferring regions of copy number gain or loss. All analyses are performed in the statistical environment R.
Collapse
Affiliation(s)
- Robert B. Scharpf
- Department of Oncology, Johns Hopkins University School of Medicine, 550 N. Broadway, Suite 1103, Baltimore, MD 21218, United States of America
| | - Rafael A. Irizarry
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore MD 21218, United States of America
| | - Matthew E. Ritchie
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Benilton Carvalho
- Department of Oncology, University of Cambridge, CRUK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 ORE, United Kingdom
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore MD 21218, United States of America
| |
Collapse
|
33
|
Ritchie ME, Liu R, Carvalho BS, Irizarry RA. Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips. BMC Bioinformatics 2011; 12:68. [PMID: 21385424 PMCID: PMC3063825 DOI: 10.1186/1471-2105-12-68] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 03/08/2011] [Indexed: 12/02/2022] Open
Abstract
Background Illumina's Infinium SNP BeadChips are extensively used in both small and large-scale genetic studies. A fundamental step in any analysis is the processing of raw allele A and allele B intensities from each SNP into genotype calls (AA, AB, BB). Various algorithms which make use of different statistical models are available for this task. We compare four methods (GenCall, Illuminus, GenoSNP and CRLMM) on data where the true genotypes are known in advance and data from a recently published genome-wide association study. Results In general, differences in accuracy are relatively small between the methods evaluated, although CRLMM and GenoSNP were found to consistently outperform GenCall. The performance of Illuminus is heavily dependent on sample size, with lower no call rates and improved accuracy as the number of samples available increases. For X chromosome SNPs, methods with sex-dependent models (Illuminus, CRLMM) perform better than methods which ignore gender information (GenCall, GenoSNP). We observe that CRLMM and GenoSNP are more accurate at calling SNPs with low minor allele frequency than GenCall or Illuminus. The sample quality metrics from each of the four methods were found to have a high level of agreement at flagging samples with unusual signal characteristics. Conclusions CRLMM, GenoSNP and GenCall can be applied with confidence in studies of any size, as their performance was shown to be invariant to the number of samples available. Illuminus on the other hand requires a larger number of samples to achieve comparable levels of accuracy and its use in smaller studies (50 or fewer individuals) is not recommended.
Collapse
Affiliation(s)
- Matthew E Ritchie
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.
| | | | | | | | | |
Collapse
|
34
|
Gidskehaug L, Kent M, Hayes BJ, Lien S. Genotype calling and mapping of multisite variants using an Atlantic salmon iSelect SNP array. Bioinformatics 2010; 27:303-10. [DOI: 10.1093/bioinformatics/btq673] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Abstract
MOTIVATION The availability of flexible open source software for the analysis of gene expression raw level data has greatly facilitated the development of widely used preprocessing methods for these technologies. However, the expansion of microarray applications has exposed the limitation of existing tools. RESULTS We developed the oligo package to provide a more general solution that supports a wide range of applications. The package is based on the BioConductor principles of transparency, reproducibility and efficiency of development. It extends the existing tools and leverages existing code for visualization, accessing data and widely used preprocessing routines. The oligo package implements a unified paradigm for preprocessing data and interfaces with other BioConductor tools for downstream analysis. Our infrastructure is general and can be used by other BioConductor packages. AVAILABILITY The oligo package is freely available through BioConductor, http://www.bioconductor.org.
Collapse
Affiliation(s)
- Benilton S Carvalho
- Department of Oncology, University of Cambridge, CRUK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
| | | |
Collapse
|
36
|
Bengtsson H, Neuvial P, Speed TP. TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics 2010; 11:245. [PMID: 20462408 PMCID: PMC2894037 DOI: 10.1186/1471-2105-11-245] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/12/2010] [Indexed: 12/15/2022] Open
Abstract
Background High-throughput genotyping microarrays assess both total DNA copy number and allelic composition, which makes them a tool of choice for copy number studies in cancer, including total copy number and loss of heterozygosity (LOH) analyses. Even after state of the art preprocessing methods, allelic signal estimates from genotyping arrays still suffer from systematic effects that make them difficult to use effectively for such downstream analyses. Results We propose a method, TumorBoost, for normalizing allelic estimates of one tumor sample based on estimates from a single matched normal. The method applies to any paired tumor-normal estimates from any microarray-based technology, combined with any preprocessing method. We demonstrate that it increases the signal-to-noise ratio of allelic signals, making it significantly easier to detect allelic imbalances. Conclusions TumorBoost increases the power to detect somatic copy-number events (including copy-neutral LOH) in the tumor from allelic signals of Affymetrix or Illumina origin. We also conclude that high-precision allelic estimates can be obtained from a single pair of tumor-normal hybridizations, if TumorBoost is combined with single-array preprocessing methods such as (allele-specific) CRMA v2 for Affymetrix or BeadStudio's (proprietary) XY-normalization method for Illumina. A bounded-memory implementation is available in the open-source and cross-platform R package aroma.cn, which is part of the Aroma Project (http://www.aroma-project.org/).
Collapse
Affiliation(s)
- Henrik Bengtsson
- Department of Statistics, University of California, Berkeley, USA.
| | | | | |
Collapse
|
37
|
Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. ACTA ACUST UNITED AC 2009; 26:242-9. [PMID: 19906825 DOI: 10.1093/bioinformatics/btp624] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Microarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs) simultaneously. The starting point for the statistical analyses used by GWAS to determine association between loci and disease is making genotype calls (AA, AB or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays and different sample batches have substantial influence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability can adversely affect the quality of findings reported by the GWAS. RESULTS We developed a method based on an enhanced version of the multi-level model used by CRLMM version 1. Two key differences are that we now account for variability across batches and improve the call-specific assessment of each call. The new model permits the development of quality metrics for SNPs, samples and batches of samples. Using three independent datasets, we demonstrate that the CRLMM version 2 outperforms CRLMM version 1 and the algorithm provided by Affymetrix, Birdseed. The main advantage of the new approach is that it enables the identification of low-quality SNPs, samples and batches. AVAILABILITY Software implementing of the method described in this article is available as free and open source code in the crlmm R/BioConductor package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benilton S Carvalho
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | | |
Collapse
|