1
|
Stahl K, Papiol S, Budde M, Heilbronner M, Oraki Kohshour M, Falkai P, Schulze TG, Heilbronner U, Bickeböller H. Aggregating single nucleotide polymorphisms improves filtering for false-positive associations postimputation. G3 (BETHESDA, MD.) 2025; 15:jkaf043. [PMID: 40053832 PMCID: PMC12060241 DOI: 10.1093/g3journal/jkaf043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 02/19/2025] [Accepted: 02/21/2025] [Indexed: 03/09/2025]
Abstract
Imputation causes bias in P-values in downstream genome-wide association studies. Imputation quality measures such as IMPUTE info are used to discriminate between false and true associations. However, implementing a high threshold often discards true associations, while a low threshold preserves false associations. This poses a challenge, especially for studies genotyped with SNP arrays. In practice, association signals register as spikes of low P-values for SNPs in close proximity owing to linkage disequilibrium, but postimputation filtering is conducted on SNPs independently. We simulated 1536 small case-control studies on the human chromosome 19 both to quantify the introduced bias and to evaluate postimputation filtering. The established IMPUTE info thresholds 0.3 and 0.8 were compared on individual SNPs and aggregated spikes in the formats "best guess genotype" and "dosage." Furthermore, we applied 2 recently published methods, Iam hiQ and MagicalRsq, to assess their effect on filtering. We found differences in false signals and imputation quality between the genotype formats, especially in the midrange between thresholds. In this midrange, 51 and 60% of associated SNPs for best guess and dosage format, respectively, are true associations. For aggregated SNPs, the majority of spikes in the midrange are true associations. We propose a new method, the Midrange Filter, which uses both thresholds and formats to classify spikes instead of SNPs. This method discards up to the same number of false signals as the upper threshold, while preserving all true associations in most simulation settings. The PsyCourse study is included as a real-data application.
Collapse
Affiliation(s)
- Katharina Stahl
- Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen 37073, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department of Psychiatry and Psychotherapy, LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Maria Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Mojtaba Oraki Kohshour
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
- Department of Immunology, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz 61357-15794, Iran
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- Department Clinical Translation, Max Planck Institute of Psychiatry, Munich 80804, Germany
- German Center for Mental Health (DZPG), partner site Munich/Augsburg, Munich 80336, Germany
| | - Thomas G Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
- German Center for Mental Health (DZPG), partner site Munich/Augsburg, Munich 80336, Germany
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, Ludwig Maximilian University of Munich, Munich 80336, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen 37073, Germany
| |
Collapse
|
2
|
Thormann KA, Tozzi V, Starke P, Bickeböller H, Baum M, Rosenberger A. ImputAccur: fast and user-friendly calculation of genotype-imputation accuracy-measures. BMC Bioinformatics 2022; 23:316. [PMID: 35927623 PMCID: PMC9351229 DOI: 10.1186/s12859-022-04863-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/27/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND ImputAccur is a software tool to measure genotype-imputation accuracy. Imputation of untyped markers is a standard approach in genome-wide association studies to close the gap between directly genotyped and other known DNA variants. However, high accuracy for imputed genotypes is fundamental. Several accuracy measures have been proposed, but unfortunately, they are implemented on different platforms, which is impractical. RESULTS With ImputAccur, the accuracy measures info, Iam-hiQ and r2-based indices can be derived from standard output files of imputation software. Sample/probe and marker filtering is possible. This allows e.g. accurate marker filtering ahead of data analysis. CONCLUSIONS The source code (Python version 3.9.4), a standalone executive file, and example data for ImputAccur are freely available at https://gitlab.gwdg.de/kolja.thormann1/imputationquality.git .
Collapse
Affiliation(s)
- Kolja A Thormann
- Institute of Computer Science, Georg-August-University Göttingen, 37077, Göttingen, Germany.
| | - Viola Tozzi
- Department of Genetic Epidemiology, University Medical Center Göttingen, 37079, Göttingen, Germany
| | - Paula Starke
- Department of Genetic Epidemiology, University Medical Center Göttingen, 37079, Göttingen, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, 37079, Göttingen, Germany
| | - Marcus Baum
- Institute of Computer Science, Georg-August-University Göttingen, 37077, Göttingen, Germany
| | - Albert Rosenberger
- Department of Genetic Epidemiology, University Medical Center Göttingen, 37079, Göttingen, Germany.
| |
Collapse
|