1
|
Lalli JL, Bortvin AN, McCoy RC, Werling DM. A T2T-CHM13 recombination map and globally diverse haplotype reference panel improves phasing and imputation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639687. [PMID: 40060455 PMCID: PMC11888259 DOI: 10.1101/2025.02.24.639687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The T2T-CHM13 complete human reference genome contains ~200 Mb of newly resolved sequence, improving read mapping and variant calling compared to GRCh38. However, the benefits of using complete reference genomes in other contexts are unclear. Here, we present a reference T2T-CHM13 recombination map and phased haplotype panel derived from 3202 samples from the 1000 Genomes Project (1KGP). Using published long-read based assemblies as a reference-neutral ground truth, we compared our T2T-CHM13 1KGP panel to the previously released GRCh38 1KGP phased callset. We find that alignment to T2T-CHM13 resulted in 38% fewer assembly-discordant genotypes and 16% fewer switch errors. The largest gains in panel accuracy are observed on chromosome X and in the regions flanking disease-causing CNVs. Simons Genome Diversity Project samples were more accurately imputed when using the T2T-CHM13 panel. Our study demonstrates that use of a T2T-native phased haplotype panel improves statistical phasing and imputation for samples from diverse human populations.
Collapse
Affiliation(s)
- Joseph L Lalli
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States
| | - Andrew N Bortvin
- Department of Biology, Johns Hopkins University, Baltimore, MD, United States
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, United States
- These authors jointly supervised this work
| | - Donna M Werling
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States
- These authors jointly supervised this work
| |
Collapse
|
2
|
Wang NK, Wiltsie N, Winata HK, Fitz-Gibbon S, Gonzalez AE, Zeltser N, Agrawal R, Oh J, Arbet J, Patel Y, Yamaguchi TN, Boutros PC. StableLift: Optimized Germline and Somatic Variant Detection Across Genome Builds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.31.621401. [PMID: 39554127 PMCID: PMC11565985 DOI: 10.1101/2024.10.31.621401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Reference genomes are foundational to modern genomics. Our growing understanding of genome structure leads to continual improvements in reference genomes and new genome "builds" with incompatible coordinate systems. We quantified the impact of genome build on germline and somatic variant calling by analyzing tumour-normal whole-genome pairs against the two most widely used human genome builds. The average individual had a build-discordance of 3.8% for germline SNPs, 8.6% for germline SVs, 25.9% for somatic SNVs and 49.6% for somatic SVs. Build-discordant variants are not simply false-positives: 47% were verified by targeted resequencing. Build-discordant variants were associated with specific genomic and technical features in variant- and algorithm-specific patterns. We leveraged these patterns to create StableLift, an algorithm that predicts cross-build stability with AUROCs of 0.934 ± 0.029. These results call for significant caution in cross-build analyses and for use of StableLift as a computationally efficient solution to mitigate inter-build artifacts.
Collapse
Affiliation(s)
- Nicholas K. Wang
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Nicholas Wiltsie
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Helena K. Winata
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Sorel Fitz-Gibbon
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Alfredo E. Gonzalez
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Nicole Zeltser
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Raag Agrawal
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Jieun Oh
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Jaron Arbet
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Department of Urology, University of California, Los Angeles
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Department of Urology, University of California, Los Angeles
| |
Collapse
|
3
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. Am J Hum Genet 2024; 111:1282-1300. [PMID: 38834072 PMCID: PMC11267525 DOI: 10.1016/j.ajhg.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 05/04/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network and Genomics Research to Elucidate the Genetics of Rare Disease Consortium. Across six routinely collected biospecimens, 61% of quantified genes were not influenced by genome build. However, we identified 1,492 genes with build-dependent quantification, 3,377 genes with build-exclusive expression, and 9,077 genes with annotation-specific expression across six routinely collected biospecimens, including 566 clinically relevant and 512 known OMIM genes. Further, we demonstrate that between builds for a given gene, a larger difference in quantification is well correlated with a larger change in expression outlier calling. Combined, we provide a database of genes impacted by build choice and recommend that transcriptomics-guided analyses and diagnoses are cross referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A Ungar
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Pagé C Goddard
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Tanner D Jensen
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Kevin S Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Christopher A Jin
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Devon E Bonner
- Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA; Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Jonathan A Bernstein
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Matthew T Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Stephen B Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
4
|
Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N, Chao KR, Walker MA, Lyu Y, Rehm HL, Neale BM, Talkowski ME, Daly MJ, Brand H, Karczewski KJ, Atkinson EG, Martin AR. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res 2024; 34:796-809. [PMID: 38749656 PMCID: PMC11216312 DOI: 10.1101/gr.278378.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 05/07/2024] [Indexed: 05/18/2024]
Abstract
Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
Collapse
Affiliation(s)
- Zan Koenig
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Mary T Yohannes
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Lethukuthula L Nkambule
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Julia K Goodrich
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Heesu Ally Kim
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Stephanie P Hao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Nareh Sahakian
- Broad Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02141, USA
| | - Katherine R Chao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark A Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Yunfei Lyu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Benjamin M Neale
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael E Talkowski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Mark J Daly
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
- Institute for Molecular Medicine Finland, 00290 Helsinki, Finland
| | - Harrison Brand
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Konrad J Karczewski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Elizabeth G Atkinson
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| |
Collapse
|
5
|
Cerdán-Vélez D, Tress ML. The T2T-CHM13 reference assembly uncovers essential WASH1 and GPRIN2 paralogues. BIOINFORMATICS ADVANCES 2024; 4:vbae029. [PMID: 38464973 PMCID: PMC10924726 DOI: 10.1093/bioadv/vbae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/02/2024] [Accepted: 02/26/2024] [Indexed: 03/12/2024]
Abstract
Summary The recently published T2T-CHM13 reference assembly completed the annotation of the final 8% of the human genome. It introduced 1956 genes, close to 100 of which are predicted to be coding because they have a protein coding parent gene. Here, we confirm the coding status and functional relevance of two of these genes, paralogues of WASHC1 and GPRIN2. We find that LOC124908094, one of four novel subtelomeric WASH1 genes uncovered in the new assembly, produces the WASH1 protein that forms part of the vital actin-regulatory WASH complex. Its coding status is supported by abundant proteomics, conservation, and cDNA evidence. It was previously assumed that gene WASHC1 produced the functional WASH1 protein, but new evidence shows that WASHC1 is a human-derived duplication and likely to be one of 12 WASH1 pseudogenes in the human gene set. We also find that the T2T-CHM13 assembly has added a functionally important copy of GPRIN2 to the human gene set. We demonstrate that uniquely mapping peptides from proteomics databases support the novel LOC124900631 rather than the GRCh38 assembly GPRIN2 gene. These new additions to the set of human coding genes underlines the importance of the new T2T-CHM13 assembly. Availability and implementation None.
Collapse
Affiliation(s)
- Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Michael Liam Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
6
|
Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N, Chao KR, Walker MA, Lyu Y, gnomAD Project Consortium, Rehm HL, Neale BM, Talkowski ME, Daly MJ, Brand H, Karczewski KJ, Atkinson EG, Martin AR. A harmonized public resource of deeply sequenced diverse human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.01.23.525248. [PMID: 36747613 PMCID: PMC9900804 DOI: 10.1101/2023.01.23.525248] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
Collapse
Affiliation(s)
- Zan Koenig
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Mary T. Yohannes
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lethukuthula L. Nkambule
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Julia K. Goodrich
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Heesu Ally Kim
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Michael W. Wilson
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Stephanie P. Hao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Nareh Sahakian
- Broad Genomics, The Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA, 02141, USA
| | - Katherine R. Chao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mark A. Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yunfei Lyu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Heidi L. Rehm
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Benjamin M. Neale
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Michael E. Talkowski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Mark J. Daly
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Molecular Medicine Finland, Helsinki, Finland
| | - Harrison Brand
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Konrad J. Karczewski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Elizabeth G. Atkinson
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Alicia R. Martin
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
7
|
Kwon R, Yeung CCS. Advances in next-generation sequencing and emerging technologies for hematologic malignancies. Haematologica 2024; 109:379-387. [PMID: 37584286 PMCID: PMC10828783 DOI: 10.3324/haematol.2022.282442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/17/2023] [Indexed: 08/17/2023] Open
Abstract
Innovations in molecular diagnostics have often evolved through the study of hematologic malignancies. Examples include the pioneering characterization of the Philadelphia chromosome by cytogenetics in the 1970s, the implementation of polymerase chain reaction for high-sensitivity detection and monitoring of mutations and, most recently, targeted next- generation sequencing to drive the prognostic and therapeutic assessment of leukemia. Hematologists and hematopath- ologists have continued to advance in the past decade with new innovations improving the type, amount, and quality of data generated for each molecule of nucleic acid. In this review article, we touch on these new developments and discuss their implications for diagnostics in hematopoietic malignancies. We review advances in sequencing platforms and library preparation chemistry that can lead to faster turnaround times, novel sequencing techniques, the development of mobile laboratories with implications for worldwide benefits, the current status of sample types, improvements to quality and reference materials, bioinformatic pipelines, and the integration of machine learning and artificial intelligence into mol- ecular diagnostic tools for hematologic malignancies.
Collapse
Affiliation(s)
- Regina Kwon
- Department of Laboratory Medicine and Pathology, University of Washington
| | - Cecilia C. S. Yeung
- Department of Laboratory Medicine and Pathology, University of Washington
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
8
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Undiagnosed Diseases Network, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.11.24301165. [PMID: 38260490 PMCID: PMC10802764 DOI: 10.1101/2024.01.11.24301165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A. Ungar
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Pagé C. Goddard
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Tanner D. Jensen
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | | | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University
| | | | | | - Devon E. Bonner
- Department of Pediatrics, School of Medicine, Stanford University
- Stanford Center for Undiagnosed Diseases, Stanford University
| | | | - Matthew T. Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
9
|
Genovese G, Rockweiler NB, Gorman BR, Bigdeli TB, Pato MT, Pato CN, Ichihara K, McCarroll SA. BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics 2024; 40:btae038. [PMID: 38261650 PMCID: PMC10832354 DOI: 10.1093/bioinformatics/btae038] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/07/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates. RESULTS Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies. AVAILABILITY AND IMPLEMENTATION The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.
Collapse
Affiliation(s)
- Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Nicole B Rockweiler
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Bryan R Gorman
- Center for Data and Computational Sciences, VA Boston HealthCare System, Boston, MA 02130, United States
- Booz Allen Hamilton Inc, McLean, VA 22102, United States
| | - Tim B Bigdeli
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY 11203, United States
- Cooperative Studies Program, VA New York Harbor Healthcare System, Brooklyn, NY 11209, United States
| | - Michelle T Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Carlos N Pato
- Department of Psychiatry, Robert Wood Johnson Medical School, New Brunswick, NJ 08901, United States
| | - Kiku Ichihara
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Genetics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
10
|
Chen NC, Paulin LF, Sedlazeck FJ, Koren S, Phillippy AM, Langmead B. Improved sequence mapping using a complete reference genome and lift-over. Nat Methods 2024; 21:41-49. [PMID: 38036856 PMCID: PMC11610747 DOI: 10.1038/s41592-023-02069-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 10/09/2023] [Indexed: 12/02/2023]
Abstract
Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
11
|
Ye Y, Maroney KJ, Wiener HW, Mamaeva OA, Junkins AD, Burkholder GA, Sudenga SL, Khushman M, Al Diffalha S, Bansal A, Shrestha S. RNA-seq analysis identifies transcriptomic profiles associated with anal cancer recurrence among people living with HIV. Ann Med 2023; 55:2199366. [PMID: 37177979 PMCID: PMC10184583 DOI: 10.1080/07853890.2023.2199366] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 12/17/2022] [Accepted: 03/31/2023] [Indexed: 05/15/2023] Open
Abstract
BACKGROUND Chemoradiation therapy (CRT) is the standard of care for squamous cell carcinoma of the anus (SCCA), the most common type of anal cancer. However, approximately one fourth of patients still relapse after CRT. METHODS We used RNA-sequencing technology to characterize coding and non-coding transcripts in tumor tissues from CRT-treated SCCA patients and compare them between 9 non-recurrent and 3 recurrent cases. RNA was extracted from FFPE tissues. Library preparations for RNA-sequencing were created using SMARTer Stranded Total RNA-Seq Kit. All libraries were pooled and sequenced on a NovaSeq 6000. Function and pathway enrichment analysis was performed with Metascape and enrichment of gene ontology (GO) was performed with Gene Set Enrichment Analysis (GSEA). RESULTS There were 449 differentially expressed genes (DEGs) observed (390 mRNA, 12 miRNA, 17 lincRNA and 18 snRNA) between the two groups. We identified a core of upregulated genes (IL4, CD40LG, ICAM2, HLA-I (HLA-A, HLA-C) and HLA-II (HLA-DQA1, HLA-DRB5) in the non-recurrent SCCA tissue enriching to the gene ontology term 'allograft rejection', which suggests a CD4+ T cell driven immune response. Conversely, in the recurrent tissues, keratin (KRT1, 10, 12, 20) and hedgehog signaling pathway (PTCH2) genes involved in 'Epidermis Development,', were significantly upregulated. We identified miR-4316, that inhibit tumor proliferation and migration by repressing vascular endothelial growth factors, as being upregulated in non-recurrent SCCA. On the contrary, lncRNA-SOX21-AS1, implicated in the progression of many other cancers, was also found to be more common in our recurrent compared to non-recurrent SCCA. Our study identified key host factors which may drive the recurrence of SCCA and warrants further studies to understand the mechanism and evaluate their potential use in personalized treatment.Key MessageOur study used RNA sequencing (RNA-seq) to identify pivotal factors in coding and non-coding transcripts which differentiate between patients at risk for recurrent anal cancer after treatment. There were 449 differentially expressed genes (390 mRNA, 12 miRNA, 17 lincRNA and 18 snRNA) between 9 non-recurrent and 3 recurrent squamous cell carcinoma of anus (SCCA) tissues. The enrichment of genes related to allograft rejection was observed in the non-recurrent SCCA tissues, while the enrichment of genes related to epidermis development was positively linked with recurrent SCCA tissues.
Collapse
Affiliation(s)
- Yuanfan Ye
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, AL, USA
| | - Kevin J. Maroney
- Department of Medicine, Division of Infectious Diseases, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Howard W. Wiener
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, AL, USA
| | - Olga A. Mamaeva
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, AL, USA
| | - Anna D. Junkins
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, AL, USA
| | - Greer A. Burkholder
- Department of Medicine, Division of Infectious Diseases, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Staci L. Sudenga
- Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mohd Khushman
- O’Neal Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sameer Al Diffalha
- Department of Pathology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Anju Bansal
- Department of Medicine, Division of Infectious Diseases, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sadeep Shrestha
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, AL, USA
| |
Collapse
|
12
|
Park KJ, Yoon YA, Park JH. Evaluation of Liftover Tools for the Conversion of Genome Reference Consortium Human Build 37 to Build 38 Using ClinVar Variants. Genes (Basel) 2023; 14:1875. [PMID: 37895222 PMCID: PMC10606611 DOI: 10.3390/genes14101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/16/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
Although Genome Reference Consortium Human Build 38 (GRCh38) was released with improvement over GRCh37, it has not been widely adopted. Several liftover tools have been developed as a convenient approach for GRCh38 implementation. This study aimed to investigate the accuracy of liftover tools for genome conversion. Two Variant Call Format (VCF) files aligned to GRCh37 and GRCh38 were downloaded from ClinVar (clinvar_20221217.vcf.gz). Liftover tools such as CrossMap, NCBI Remap, and UCSC liftOver were used to convert genome coordinates from GRCh37 to GRCh38. The accuracy of CrossMap, NCBI Remap, and UCSC liftOver were 99.81% (1,567,838/1,570,748), 99.69% (1,565,953/1,570,748), and 99.99% (1,570,550/1,570,748), respectively. Variants that failed conversion via all three liftover tools were all indels/duplications: a pathogenic/likely pathogenic variant (n = 1) and benign/likely benign variants (n = 7). The eight variants that failed conversion were identified in the ALMS, TTN, CFTR, SLCO, LDLR, PCNT, MID1, and GRIA3 genes, and all the variants were not in the VCF files aligned to GRCh37. This study demonstrated that three liftover tools could successfully convert reference genomes from GRCh37 to GRCh38 in more than 99% of ClinVar variants. This study takes the first step to clinically implement GRCh38 using liftover tools. Further clinical studies are warranted to compare the performance of liftover tools and to validate re-alignment approaches in routine clinical settings.
Collapse
Affiliation(s)
- Kyoung-Jin Park
- Department of Laboratory Medicine & Genetics, Samsung Changwon Hospital, Sungkyunkwan University School of Medicine, Changwon 51353, Republic of Korea
| | - Young Ahn Yoon
- Department of Laboratory Medicine, Soonchunhyang University Cheonan Hospital, Soonchunhyang University College of Medicine, Cheonan 31151, Republic of Korea;
| | - Jong-Ho Park
- Clinical Genomics Center, Samsung Medical Center, Seoul 06351, Republic of Korea;
| |
Collapse
|
13
|
Foreman J, Perrett D, Mazaika E, Hunt SE, Ware JS, Firth HV. DECIPHER: Improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet 2023; 24:151-176. [PMID: 37285546 PMCID: PMC7615097 DOI: 10.1146/annurev-genom-102822-100509] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Collapse
Affiliation(s)
- Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel Perrett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Erica Mazaika
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
- Royal Brompton and Harefield Hospitals, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom;
| |
Collapse
|
14
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
15
|
Lansdon LA, Cadieux-Dion M, Herriges JC, Johnston J, Yoo B, Alaimo JT, Thiffault I, Miller N, Cohen ASA, Repnikova EA, Zhang L, Farooqi MS, Farrow EG, Saunders CJ. Clinical Validation of Genome Reference Consortium Human Build 38 in a Laboratory Utilizing Next-Generation Sequencing Technologies. Clin Chem 2022; 68:1177-1183. [DOI: 10.1093/clinchem/hvac113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/31/2022] [Indexed: 01/02/2023]
Abstract
Abstract
Background
Laboratories utilizing next-generation sequencing align sequence data to a standardized human reference genome (HRG). Several updated versions, or builds, have been released since the original HRG in 2001, including the Genome Reference Consortium Human Build 38 (GRCh38) in 2013. However, most clinical laboratories still use GRCh37, which was released in 2009. We report our laboratory’s clinical validation of GRCh38.
Methods
Migration to GRCh38 was validated by comparing the coordinates (lifting over) of 9443 internally curated variants from GRCh37 to GRCh38, globally comparing protein coding sequence variants aligned with GRCh37 vs GRCh38 from 917 exomes, assessing genes with known discrepancies, comparing coverage differences, and establishing the analytic sensitivity and specificity of variant detection using Genome in a Bottle data.
Results
Eight discrepancies, due to strand swap or reference base, were observed. Three clinically relevant variants had the GRCh37 alternate allele as the reference allele in GRCh38. A comparison of 88 295 calls between builds identified 8 disease-associated genes with sequence differences: ABO, BNC2, KIZ, NEFL, NR2E3, PTPRQ, SHANK2, and SRD5A2. Discrepancies in coding regions in GRCh37 were resolved in GRCh38.
Conclusions
There were a small number of clinically significant changes between the 2 genome builds. GRCh38 provided improved detection of nucleotide changes due to the resolution of discrepancies present in GRCh37. Implementation of GRCh38 results in more accurate and consistent reporting.
Collapse
Affiliation(s)
- Lisa A Lansdon
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Maxime Cadieux-Dion
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
| | - John C Herriges
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Jeffrey Johnston
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
| | - Byunggil Yoo
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
| | - Joseph T Alaimo
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Isabelle Thiffault
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Neil Miller
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
- Bionano Genomics, Inc. , 9540 Towne Centre Dr., Suite 100, San Diego, CA , USA
| | - Ana S A Cohen
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Elena A Repnikova
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Lei Zhang
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
| | - Midhat S Farooqi
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| | - Emily G Farrow
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
- Department of Pediatrics Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
| | - Carol J Saunders
- Department of Pathology and Laboratory Medicine, Children’s Mercy—Kansas City , 2401 Gillham Rd., Kansas City, MO , USA
- Genomic Medicine Center, Children’s Mercy Research Institute—Kansas City , 2420 Pershing Rd. Suite 100, Kansas City, MO , USA
- School of Medicine, University of Missouri-Kansas City , 2411 Holmes St., Kansas City, MO , USA
| |
Collapse
|
16
|
Birney E. The International Human Genome Project. Hum Mol Genet 2021; 30:R161-R163. [PMID: 34264324 PMCID: PMC8490009 DOI: 10.1093/hmg/ddab198] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 07/08/2021] [Accepted: 07/09/2021] [Indexed: 12/01/2022] Open
Abstract
The human genome project was conceived and executed as an international project, due to both pragmatic and principled reasons. This internationality has served the project well, with the resulting human genome being freely available for all researchers in all countries. Over time the reference human genome will likely have to evolve to a graph genome, and tap into more diverse sequences worldwide. A similar international mindset underpins data analysis for the interpretation of the human genome from basic to clinical research.
Collapse
Affiliation(s)
- Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|