1
|
Betschart RO, Riccio C, Aguilera-Garcia D, Blankenberg S, Guo L, Moch H, Seidl D, Solleder H, Thalén F, Thiéry A, Twerenbold R, Zeller T, Zoche M, Ziegler A. Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control. Biom J 2024; 66:e202300278. [PMID: 38988195 DOI: 10.1002/bimj.202300278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 07/12/2024]
Abstract
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Collapse
Affiliation(s)
| | | | - Domingo Aguilera-Garcia
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Stefan Blankenberg
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Linlin Guo
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Holger Moch
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Dagmar Seidl
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Hugo Solleder
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | | | - Raphael Twerenbold
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Tanja Zeller
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Martin Zoche
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
2
|
Abstract
10 years ago, a detailed analysis showed that only 33% of genome-wide association study (GWAS) results included the X chromosome. Multiple recommendations were made to combat such exclusion. Here, we re-surveyed the research landscape to determine whether these earlier recommendations had been translated. Unfortunately, among the genome-wide summary statistics reported in 2021 in the NHGRI-EBI GWAS Catalog, only 25% provided results for the X chromosome and 3% for the Y chromosome, suggesting that the exclusion phenomenon not only persists but has also expanded into an exclusionary problem. Normalizing by physical length of the chromosome, the average number of studies published through November 2022 with genome-wide-significant findings on the X chromosome is ∼1 study/Mb. By contrast, it ranges from ∼6 to ∼16 studies/Mb for chromosomes 4 and 19, respectively. Compared with the autosomal growth rate of ∼0.086 studies/Mb/year over the last decade, studies of the X chromosome grew at less than one-seventh that rate, only ∼0.012 studies/Mb/year. Among the studies that reported significant associations on the X chromosome, we noted extreme heterogeneities in data analysis and reporting of results, suggesting the need for clear guidelines. Unsurprisingly, among the 430 scores sampled from the PolyGenic Score Catalog, 0% contained weights for sex chromosomal SNPs. To overcome the dearth of sex chromosome analyses, we provide five sets of recommendations and future directions. Finally, until the sex chromosomes are included in a whole-genome study, instead of GWASs, we propose such studies would more properly be referred to as "AWASs," meaning "autosome-wide scans."
Collapse
Affiliation(s)
- Lei Sun
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| | - Zhong Wang
- Department of Statistics and Data Science, Faculty of Science, National University of Singapore, Singapore
| | - Tianyuan Lu
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Teri A Manolio
- Division of Genomic Medicine, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Andrew D Paterson
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
| |
Collapse
|
3
|
Shin HR, Cho WK, Baek IC, Lee NY, Lee YJ, Kim SK, Ahn MB, Suh BK, Kim TG. Polymorphisms of IRAK1 Gene on X Chromosome Is Associated with Hashimoto Thyroiditis in Korean Children. Endocrinology 2020; 161:bqaa088. [PMID: 32498091 DOI: 10.1210/endocr/bqaa088] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 05/28/2020] [Indexed: 02/06/2023]
Abstract
Autoimmune thyroid disease (AITD) is predominant in females and has been focused on the sexual diploid in immune response. The IL-1 receptor-associated kinase 1 (IRAK1) gene on the X chromosome was recently suggested as strong autoimmune disease-susceptible loci, second to the major histocompatibility complex region. We investigated the frequency of IRAK1 single-nucleotide polymorphisms (SNPs) in children with AITD. In this study, we observed that SNPs of IRAK1 including rs3027898, rs1059703, and rs1059702 in 115 Korean AITD pediatric patients (Graves' disease = 74 [females = 52/males = 22]; Hashimoto disease [HD] = 41 [females = 38/males = 3]; thyroid-associated ophthalmopathy [TAO] = 40 (females = 27/males = 13); without TAO = 75 (females = 63/males = 12); total males = 25, total females = 90; mean age = 11.9 years) and 204 healthy Korean individuals (males = 104/females = 100). The data from cases and controls were analyzed from separate sex-stratified or all combined by χ 2 test for categorical variables and Student t test for numerical variables. Our study revealed that SNPs of IRAK1-associated HD and without TAO but Graves' disease and TAO were not found significant. When cases and controls were analyzed by separate sex, we found that rs3027898 AA, rs1059703 AA, and rs1059702 GG showed disease susceptibility in female AITD, HD, and without TAO. Also, all rs3027898, rs1059703, and rs1059702 were found to be in strong linkage disequilibrium (D' = 0.96-0.98, r2 = 0.83-0.97). The haplotype of 3 SNPs was higher in AITD than in controls (CGA, r2 = 5.42, P = 0.019). Our results suggest that IRAK1 polymorphisms may contribute to the pathogenesis of HD, AITD, and without thyroid-associated ophthalmopathy for females.
Collapse
Affiliation(s)
- Hye-Ri Shin
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Won Kyoung Cho
- Department of Pediatrics, College of Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - In-Cheol Baek
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Na Yeong Lee
- Department of Pediatrics, College of Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Yoon Ji Lee
- Department of Pediatrics, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, Republic of Korea
| | - Seul Ki Kim
- Department of Pediatrics, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, Republic of Korea
| | - Moon Bae Ahn
- Department of Pediatrics, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, Republic of Korea
| | - Byung-Kyu Suh
- Department of Pediatrics, College of Medicine, Seoul St. Mary's Hospital, The Catholic University of Korea, Seoul, Republic of Korea
| | - Tai-Gyu Kim
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| |
Collapse
|
4
|
Cho WK, Shin HR, Lee NY, Kim SK, Ahn MB, Baek IC, Kim TG, Suh BK. GPR174 and ITM2A Gene Polymorphisms rs3827440 and rs5912838 on the X chromosome in Korean Children with Autoimmune Thyroid Disease. Genes (Basel) 2020; 11:genes11080858. [PMID: 32727090 PMCID: PMC7465061 DOI: 10.3390/genes11080858] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 07/21/2020] [Accepted: 07/22/2020] [Indexed: 12/12/2022] Open
Abstract
(1) Background: Autoimmune thyroid diseases (AITDs) are female predominant and much attention has been focused on G protein-coupled receptor 174 (GPR174) and integral membrane protein 2A (ITM2A) on the X chromosome as Grave's disease (GD) susceptible locus. (2) Methods: We genotyped four single nucleotide polymorphisms (SNPs), rs3810712, rs3810711, rs3827440, and rs5912838, of GPR174 and ITM2A in 115 Korean children with AITD (M = 25 and F = 90; GD = 74 (14.7 ± 3.6 years), HD = 41 (13.4 ± 3.2 years); GD-thyroid-associated ophthalmopathy (TAO) = 40, GD-non-TAO=34) and 204 healthy Korean individuals (M = 104 and F = 100). The data were analyzed by sex-stratified or combined. (3) Results: Three SNPs, rs3810712, rs3810711 and rs3827440, were found to be in perfect linkage disequilibrium (D' = 1, r2 = 1). In AITD, HD, GD, GD-TAO, and GD-non-TAO patients, rs3827440 TT/T and rs5912838 AA/A were susceptible and rs3827440 CC/C and rs5912838 CC/C were protective genotypes. When analyzed by sex, rs3827440 TT and rs5912838 AA were susceptible and rs3827440 CC and rs5912838 CC were protective genotypes in female AITD, GD, GD-TAO, and GD-non-TAO subjects. In male AITD patients, rs3827440 T and rs5912838 A were susceptible and rs3827440 C and rs5912838 C were protective genotypes. (4) Conclusions: Polymorphisms in GPR174 and ITM2A genes on the X chromosome might be associated with AITD in Korean children.
Collapse
Affiliation(s)
- Won Kyoung Cho
- Department of Pediatrics, College of Medicine, St. Vincent’s Hospital, The Catholic University of Korea, Seoul 065941, Korea;
| | - Hye-Ri Shin
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul 065941, Korea; (H.-R.S.); (I.-C.B.)
| | - Na Yeong Lee
- Department of Pediatrics, College of Medicine, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul 065941, Korea; (N.Y.L.); (S.K.K.); (M.B.A.)
| | - Seul Ki Kim
- Department of Pediatrics, College of Medicine, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul 065941, Korea; (N.Y.L.); (S.K.K.); (M.B.A.)
| | - Moon Bae Ahn
- Department of Pediatrics, College of Medicine, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul 065941, Korea; (N.Y.L.); (S.K.K.); (M.B.A.)
| | - In-Cheol Baek
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul 065941, Korea; (H.-R.S.); (I.-C.B.)
| | - Tai-Gyu Kim
- Catholic Hematopoietic Stem Cell Bank, College of Medicine, The Catholic University of Korea, Seoul 065941, Korea; (H.-R.S.); (I.-C.B.)
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul 065941, Korea
- Correspondence: (T.-G.K.); (B.-K.S.); Tel.: +82-2-2258-7341 (T.-G.K.); +82-2-2258-6185 (B.-K.S.); Fax: +82-2-594-7355 (T.-G.K.); 82-2-532-6185 (B.-K.S.)
| | - Byung-Kyu Suh
- Department of Pediatrics, College of Medicine, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul 065941, Korea; (N.Y.L.); (S.K.K.); (M.B.A.)
- Correspondence: (T.-G.K.); (B.-K.S.); Tel.: +82-2-2258-7341 (T.-G.K.); +82-2-2258-6185 (B.-K.S.); Fax: +82-2-594-7355 (T.-G.K.); 82-2-532-6185 (B.-K.S.)
| |
Collapse
|