1
|
Lin YS, Tan T, Wang Y, Pasaniuc B, Martin AR, Atkinson EG. Differential performance of polygenic prediction across traits and populations depending on genotype discovery approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.18.644029. [PMID: 40166153 PMCID: PMC11957064 DOI: 10.1101/2025.03.18.644029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Polygenic scores (PGS) are widely used for estimating genetic predisposition to complex traits by aggregating the effects of common variants into a single measure. They hold promise in identifying individuals at increased risk for diseases, allowing earlier screening and interventions. Genotyping arrays, commonly used for PGS computation, are affordable and computationally efficient, while whole-genome sequencing (WGS) offers a comprehensive view of genetic variation. Using the same set of individuals, we compared PGS derived from arrays and WGS across multiple traits to evaluate differences in predictive performance, portability across populations, and computational efficiency. We computed PGS for 10 traits across the spectrum of heritability and polygenicity in the three largest genetic ancestry groups in All of Us (European, African American, Admixed American), trained on the multi-ancestry meta-analyses from the Pan-UK Biobank. Using the clumping and thresholding (C+T) method, we found that WGS-based PGS outperformed array-based PRS for highly polygenic traits but showed differentially reduced accuracy for sparse traits in certain populations. This may be attributable to the lower allele frequency observed in clumped variants from WGS compared to arrays. Using the LD-informed PRS-CS method, we observed overall improved prediction performance compared to C+T, with WGS outperforming arrays across most non-cancer traits. In conclusion, while PGS computed using WGS generally provide superior predictive power with PRS-CS, the advantage over arrays is context-dependent, varying by trait, population, and the PGS method. This study provides insights into the complexities and potential advantages of using different genotype discovery approach for polygenic predictions in diverse populations. Graphical abstract
Collapse
|
2
|
Braat S, Fielding KL, Han J, Jackson VE, Zaloumis S, Xu JXH, Moir-Meyer G, Blaauwendraad SM, Jaddoe VWV, Gaillard R, Parkin PC, Borkhoff CM, Keown-Stoneman CDG, Birken CS, Maguire JL, Bahlo M, Davidson EM, Pasricha SR. Haemoglobin thresholds to define anaemia from age 6 months to 65 years: estimates from international data sources. Lancet Haematol 2024; 11:e253-e264. [PMID: 38432242 PMCID: PMC10983828 DOI: 10.1016/s2352-3026(24)00030-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/25/2024] [Accepted: 01/25/2024] [Indexed: 03/05/2024]
Abstract
BACKGROUND Detection of anaemia is crucial for clinical medicine and public health. Current WHO anaemia definitions are based on statistical thresholds (fifth centiles) set more than 50 years ago. We sought to establish evidence for the statistical haemoglobin thresholds for anaemia that can be applied globally and inform WHO and clinical guidelines. METHODS In this analysis we identified international data sources from populations in the USA, England, Australia, China, the Netherlands, Canada, Ecuador, and Bangladesh with sufficient clinical and laboratory information collected between 1998 and 2020 to obtain a healthy reference sample. Individuals with clinical or biochemical evidence of a condition that could reduce haemoglobin concentrations were excluded. We estimated haemoglobin thresholds (ie, 5th centiles) for children aged 6-23 months, 24-59 months, 5-11 years, and 12-17 years, and adults aged 18-65 years (including during pregnancy) for individual datasets and pooled across data sources. We also collated findings from three large-scale genetic studies to summarise genetic variants affecting haemoglobin concentrations in different ancestral populations. FINDINGS We identified eight data sources comprising 18 individual datasets that were eligible for inclusion in the analysis. In pooled analyses, the haemoglobin fifth centile was 104·4 g/L (90% CI 103·5-105·3) in 924 children aged 6-23 months, 110·2 g/L (109·5-110·9) in 1874 children aged 24-59 months, and 114·4 g/L (113·6-115·2) in 1839 children aged 5-11 years. Values diverged by sex in adolescents and adults. In pooled analyses, the fifth centile was 122·2 g/L (90% CI 121·3-123·1) in 1741 female adolescents aged 12-17 years and 128·2 g/L (126·4-130·0) in 1103 male adolescents aged 12-17 years. In pooled analyses of adults aged 18-65 years, the fifth centile was 119·7 g/L (90% CI 119·1-120·3) in 3640 non-pregnant females and 134·9 g/L (134·2-135·6) in 2377 males. Fifth centiles in pregnancy were 110·3 g/L (90% CI 109·5-111·0) in the first trimester (n=772) and 105·9 g/L (104·0-107·7) in the second trimester (n=111), with insufficient data for analysis in the third trimester. There were insufficient data for adults older than 65 years. We did not identify ancestry-specific high prevalence of non-clinically relevant genetic variants that influence haemoglobin concentrations. INTERPRETATION Our results enable global harmonisation of clinical and public health haemoglobin thresholds for diagnosis of anaemia. Haemoglobin thresholds are similar between sexes until adolescence, after which males have higher thresholds than females. We did not find any evidence that thresholds should differ between people of differering ancestries. FUNDING World Health Organization and the Bill & Melinda Gates Foundation.
Collapse
Affiliation(s)
- Sabine Braat
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Methods and Implementation Support for Clinical and Health research Hub, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Katherine L Fielding
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia; Clinical Haematology, The Austin Hospital, Heidelberg, VIC, Australia
| | - Jiru Han
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Victoria E Jackson
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Sophie Zaloumis
- Methods and Implementation Support for Clinical and Health research Hub, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Jessica Xu Hui Xu
- Methods and Implementation Support for Clinical and Health research Hub, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Gemma Moir-Meyer
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Sophia M Blaauwendraad
- Generation R Study Group, and Department of Pediatrics, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Vincent W V Jaddoe
- Generation R Study Group, and Department of Pediatrics, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Romy Gaillard
- Generation R Study Group, and Department of Pediatrics, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Patricia C Parkin
- Division of Pediatric Medicine and the Pediatric Outcomes Research Team, The Hospital for Sick Children, Toronto, ON, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Cornelia M Borkhoff
- Division of Pediatric Medicine and the Pediatric Outcomes Research Team, The Hospital for Sick Children, Toronto, ON, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Charles D G Keown-Stoneman
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Unity Health Toronto, Toronto, ON, Canada
| | - Catherine S Birken
- Division of Pediatric Medicine and the Pediatric Outcomes Research Team, The Hospital for Sick Children, Toronto, ON, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Jonathon L Maguire
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Unity Health Toronto, Toronto, ON, Canada
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Eliza M Davidson
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
| | - Sant-Rayn Pasricha
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia; Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia; Diagnostic Haematology, The Royal Melbourne Hospital, Parkville, VIC, Australia; Clinical Haematology, Peter MacCallum Cancer Centre and The Royal Melbourne Hospital, Parkville, VIC, Australia.
| |
Collapse
|
3
|
Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform 2023; 24:bbad297. [PMID: 37587831 PMCID: PMC10516374 DOI: 10.1093/bib/bbad297] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/05/2023] [Accepted: 07/23/2023] [Indexed: 08/18/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that can take many different forms such as copy number alterations, inversions and translocations. During cell development and aging, somatic SVs accumulate in the genome with potentially neutral, deleterious or pathological effects. Generation of somatic SVs is a key mutational process in cancer development and progression. Despite their importance, the detection of somatic SVs is challenging, making them less studied than somatic single-nucleotide variants. In this review, we summarize recent advances in whole-genome sequencing (WGS)-based approaches for detecting somatic SVs at the tissue and single-cell levels and discuss their advantages and limitations. First, we describe the state-of-the-art computational algorithms for somatic SV calling using bulk WGS data and compare the performance of somatic SV detectors in the presence or absence of a matched-normal control. We then discuss the unique features of cutting-edge single-cell-based techniques for analyzing somatic SVs. The advantages and disadvantages of bulk and single-cell approaches are highlighted, along with a discussion of their sensitivity to copy-neutral SVs, usefulness for functional inferences and experimental and computational costs. Finally, computational approaches for linking somatic SVs to their functional readouts, such as those obtained from single-cell transcriptome and epigenome analyses, are illustrated, with a discussion of the promise of these approaches in health and diseases.
Collapse
Affiliation(s)
- Dohun Yi
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Hyobin Jeong
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
4
|
Braat S, Fielding K, Han J, Jackson VE, Zaloumis S, Xu JXH, Moir-Meyer G, Blaauwendraad SM, Jaddoe VWV, Gaillard R, Parkin PC, Borkhoff CM, Keown-Stoneman CDG, Birken CS, Maguire JL, Bahlo M, Davidson E, Pasricha SR. Statistical haemoglobin thresholds to define anaemia across the lifecycle. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.22.23290129. [PMID: 37292786 PMCID: PMC10246131 DOI: 10.1101/2023.05.22.23290129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Detection of anaemia is critical for clinical medicine and public health. Current WHO values that define anaemia are statistical thresholds (5 th centile) set over 50 years ago, and are presently <110g/L in children 6-59 months, <115g/L in children 5-11 years, <110g/L in pregnant women, <120g/L in children 12-14 years of age, <120g/L in non-pregnant women, and <130g/L in men. Haemoglobin is sensitive to iron and other nutrient deficiencies, medical illness and inflammation, and is impacted by genetic conditions; thus, careful exclusion of these conditions is crucial to obtain a healthy reference population. We identified data sources from which sufficient clinical and laboratory information was available to determine an apparently healthy reference sample. Individuals were excluded if they had any clinical or biochemical evidence of a condition that may diminish haemoglobin concentration. Discrete 5 th centiles were estimated along with two-sided 90% confidence intervals and estimates combined using a fixed-effect approach. Estimates for the 5 th centile of the healthy reference population in children were similar between sexes. Thresholds in children 6-23 months were 104.4g/L [90% CI 103.5, 105.3]; in children 24-59 months were 110.2g/L [109.5, 110.9]; and in children 5-11 years were 114.1g/L [113.2, 115.0]. Thresholds diverged by sex in adolescents and adults. In females and males 12-17 years, thresholds were 122.2g/L [121.3, 123.1] and 128.2 [126.4, 130.0], respectively. In adults 18-65 years, thresholds were 119.7g/L [119.1, 120.3] in non-pregnant females and 134.9g/L [134.2, 135.6] in males. Limited analyses indicated 5 th centiles in first-trimester pregnancy of 110.3g/L [109.5, 111.0] and 105.9g/L [104.0, 107.7] in the second trimester. All thresholds were robust to variations in definitions and analysis models. Using multiple datasets comprising Asian, African, and European ancestries, we did not identify novel high prevalence genetic variants that influence haemoglobin concentration, other than variants in genes known to cause important clinical disease, suggesting non-clinical genetic factors do not influence the 5 th centile between ancestries. Our results directly inform WHO guideline development and provide a platform for global harmonisation of laboratory, clinical and public health haemoglobin thresholds.
Collapse
|
5
|
English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 2022; 23:271. [PMID: 36575487 DOI: 10.1101/2022.02.21.481353] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 12/15/2022] [Indexed: 05/25/2023] Open
Abstract
The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongside technologies which produce ill-defined boundaries. As SV detection becomes more exact, algorithms to preserve this refined signal are needed. Here, we present Truvari-an SV comparison, annotation, and analysis toolkit-and demonstrate the effect of SV comparison choices by building population-level VCFs from 36 haplotype-resolved long-read assemblies. We observe over-merging from other SV merging approaches which cause up to a 2.2× inflation of allele frequency, relative to Truvari.
Collapse
Affiliation(s)
- Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA.
| | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| |
Collapse
|
6
|
English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 2022; 23:271. [PMID: 36575487 PMCID: PMC9793516 DOI: 10.1186/s13059-022-02840-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022] Open
Abstract
The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongside technologies which produce ill-defined boundaries. As SV detection becomes more exact, algorithms to preserve this refined signal are needed. Here, we present Truvari-an SV comparison, annotation, and analysis toolkit-and demonstrate the effect of SV comparison choices by building population-level VCFs from 36 haplotype-resolved long-read assemblies. We observe over-merging from other SV merging approaches which cause up to a 2.2× inflation of allele frequency, relative to Truvari.
Collapse
Affiliation(s)
- Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA.
| | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| |
Collapse
|