1
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. Genome Res 2025; 35:914-928. [PMID: 40113264 PMCID: PMC12047269 DOI: 10.1101/gr.279323.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 01/06/2025] [Indexed: 03/22/2025]
Abstract
Rare structural variants (SVs)-insertions, deletions, and complex rearrangements-can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
Affiliation(s)
- Tanner D Jensen
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Chloe M Reuter
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - John E Gorzynski
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33136, USA
| | - Devon Bonner
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Division of Medical Genetics, Stanford University School of Medicine, Stanford, California 94304, USA
| | - Rachel A Ungar
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Pagé C Goddard
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Archana Raja
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Euan A Ashley
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Jonathan A Bernstein
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California 94304, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33136, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, California 94305, USA;
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Matthew T Wheeler
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA;
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
- GREGoR Stanford Site, Stanford University, Stanford, California 94305, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
2
|
Gong T, Jiang J, Uthayopas K, Bornman MSR, Gheybi K, Stricker PD, Weischenfeldt J, Mutambirwa SBA, Jaratlerdsiri W, Hayes VM. Rare pathogenic structural variants show potential to enhance prostate cancer germline testing for African men. Nat Commun 2025; 16:2400. [PMID: 40064858 PMCID: PMC11893795 DOI: 10.1038/s41467-025-57312-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 02/18/2025] [Indexed: 03/14/2025] Open
Abstract
Prostate cancer (PCa) is highly heritable, with men of African ancestry at greatest risk and associated lethality. Lack of representation in genomic data means germline testing guidelines exclude for Africans. Established that structural variations (SVs) are major contributors to human disease and prostate tumourigenesis, their role is under-appreciated in familial and therapeutic testing. Utilising clinico-methodologically matched deep-sequenced whole-genome data for 113 African versus 57 European PCa patients, we interrogate 42,966 high-quality germline SVs using a best-fit pathogenicity prediction workflow. We identify 15 potentially pathogenic SVs representing 12.4% African and 7.0% European patients, of which 72% and 86% met germline testing standard-of-care recommendations, respectively. Notable African-specific loss-of-function gene candidates include DNA damage repair MLH1 and BARD1 and tumour suppressors FOXP1, WASF1 and RB1. Representing only a fraction of the vast African diaspora, this study raises considerations with respect to the contribution of kilo-to-mega-base rare variants to PCa pathogenicity and African-associated disparity.
Collapse
Affiliation(s)
- Tingting Gong
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia
- Human Phenome Institute, Fudan University, Shanghai, China
| | - Jue Jiang
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia
| | - Korawich Uthayopas
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia
| | - M S Riana Bornman
- School of Health Systems and Public Health, University of Pretoria, Pretoria, South Africa
| | - Kazzem Gheybi
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia
| | | | - Joachim Weischenfeldt
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark
- Biotech Research & Innovation Centre, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Shingai B A Mutambirwa
- Department of Urology, Sefako Makgatho Health Science University, Dr George Mukhari Academic Hospital, Medunsa, Ga-Rankuwa, South Africa
| | - Weerachai Jaratlerdsiri
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia
| | - Vanessa M Hayes
- Ancestry and Health Genomics Laboratory, Charles Perkins Centre, School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW, 2050, Australia.
- School of Health Systems and Public Health, University of Pretoria, Pretoria, South Africa.
- Manchester Cancer Research Centre, University of Manchester, Manchester, M20 4GJ, UK.
| |
Collapse
|
3
|
Liu X, Gu L, Hao C, Xu W, Leng F, Zhang P, Li W. Systematic assessment of structural variant annotation tools for genomic interpretation. Life Sci Alliance 2025; 8:e202402949. [PMID: 39658089 PMCID: PMC11632063 DOI: 10.26508/lsa.202402949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 12/12/2024] Open
Abstract
Structural variants (SVs) over 50 base pairs play a significant role in phenotypic diversity and are associated with various diseases, but their analysis is complex and resource-intensive. Numerous computational tools have been developed for SV prioritization, yet their effectiveness in biomedicine remains unclear. Here we benchmarked eight widely used SV prioritization tools, categorized into knowledge-driven (AnnotSV, ClassifyCNV) and data-driven (CADD-SV, dbCNV, StrVCTVRE, SVScore, TADA, XCNV) groups in accordance with the ACMG guidelines. We assessed their accuracy, robustness, and usability across diverse genomic contexts, biological mechanisms and computational efficiency using seven carefully curated independent datasets. Our results revealed that both groups of methods exhibit comparable effectiveness in predicting SV pathogenicity, although performance varies among tools, emphasizing the importance of selecting the appropriate tool based on specific research purposes. Furthermore, we pinpointed the potential improvement of expanding these tools for future applications. Our benchmarking framework provides a crucial evaluation method for SV analysis tools, offering practical guidance for biomedical research and facilitating the advancement of better genomic research tools.
Collapse
Affiliation(s)
- Xuanshi Liu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Lei Gu
- Epigenetics Laboratory, Max-Planck Institute for Heart and Lung Research, Cardiopulmonary Institute, Bad Nauheim, Germany
| | - Chanjuan Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Fei Leng
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
4
|
Chen X, Wei S, Sun C, Yi Z, Wang Z, Wu Y, Xu J, Tao J, Chen H, Zhang M, Jiang Y, Lv H, Huang C. Computational Tools for Studying Genome Structural Variation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2025; 29:36-48. [PMID: 39905890 DOI: 10.1089/omi.2024.0200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
Structural variation (SV) typically refers to alterations in DNA fragments at least 50 base pairs long in the human genome. It can alter thousands of DNA nucleotides and thus significantly influence human health, disease, and clinical phenotypes. There is a shared and growing recognition that the emergence of effective computational tools and high-throughput technologies such as short-read sequencing and long-read sequencing offers novel insight into SV and, by extension, diseases affecting planetary health. However, numerous available SV tools exist with varying strengths and weaknesses. This is currently hampering the abilities of scholars to select the optimal tools to study SVs. Here, we reviewed 175 tools developed in the past two decades for SV detection, annotation, visualization, and downstream analysis of human genomics. In this expert review, we provide a comprehensive catalog of SV-related tools across different technology platforms and summarize their features, strengths, and limitations with an eye to accelerate systems science and planetary health innovations.
Collapse
Affiliation(s)
- Xingyu Chen
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zelin Yi
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Zihan Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Yingyi Wu
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Huang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
5
|
Hayes V, Gong T, Jiang J, Bornman R, Gheybi K, Stricker P, Weischenfeldt J, Mutambirwa S. Rare pathogenic structural variants show potential to enhance prostate cancer germline testing for African men. RESEARCH SQUARE 2024:rs.3.rs-4531885. [PMID: 38947031 PMCID: PMC11213160 DOI: 10.21203/rs.3.rs-4531885/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Prostate cancer (PCa) is highly heritable, with men of African ancestry at greatest risk and associated lethality. Lack of representation in genomic data means germline testing guidelines exclude for African men. Established that structural variations (SVs) are major contributors to human disease and prostate tumourigenesis, their role is under-appreciated in familial and therapeutic testing. Utilising a clinico-methodologically matched African (n = 113) versus European (n = 57) deep-sequenced PCa resource, we interrogated 42,966 high-quality germline SVs using a best-fit pathogenicity prediction workflow. We identified 15 potentially pathogenic SVs representing 12.4% African and 7.0% European patients, of which 72% and 86% met germline testing standard-of-care recommendations, respectively. Notable African-specific loss-of-function gene candidates include DNA damage repair MLH1 and BARD1 and tumour suppressors FOXP1, WASF1 and RB1. Representing only a fraction of the vast African diaspora, this study raises considerations with respect to the contribution of kilo-to-mega-base rare variants to PCa pathogenicity and African associated disparity.
Collapse
Affiliation(s)
| | | | - Jue Jiang
- Garvan Institute of Medical Research
| | | | | | | | | | | |
Collapse
|
6
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Undiagnosed Diseases Network, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|