1
|
Zeng T, Liao H, Xia L, You S, Huang Y, Zhang J, Liu Y, Liu X, Xie D. Multisite long-read sequencing reveals the early contributions of somatic structural variations to HBV-related hepatocellular carcinoma tumorigenesis. Genome Res 2025; 35:671-685. [PMID: 40037842 PMCID: PMC12047258 DOI: 10.1101/gr.279617.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 01/30/2025] [Indexed: 03/06/2025]
Abstract
Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We found that adjacent nontumor tissue is not entirely normal, as significant somatic SV alterations were detected in these nontumor genomes. The adjacent nontumor tissue is highly similar to tumor tissue in terms of somatic SVs but differs in somatic single-nucleotide variants and copy number variations. The types of SVs in adjacent nontumor and tumor tissue are markedly different, with somatic insertions and deletions identified as early genomic events associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently results in the generation of somatic SVs, particularly inducing interchromosomal translocations (TRAs). Although HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are early driving events in the pathogenesis of HCC. Long-read RNA sequencing reveals that some HBV-induced SVs impact cancer-associated genes, with TRAs being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.
Collapse
Affiliation(s)
- Tianfu Zeng
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Haotian Liao
- Division of Liver Surgery, Department of General Surgery and Laboratory of Liver Surgery, and State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Lin Xia
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Siyao You
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yanqun Huang
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Jiaxun Zhang
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yahui Liu
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Xuyan Liu
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Dan Xie
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China;
| |
Collapse
|
2
|
Xu IRL, Danzi MC, Raposo J, Züchner S. The continued promise of genomic technologies and software in neurogenetics. J Neuromuscul Dis 2025:22143602251325345. [PMID: 40208247 DOI: 10.1177/22143602251325345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
The continued evolution of genomic technologies over the past few decades has revolutionized the field of neurogenetics, offering profound insights into the genetic underpinnings of neurological disorders. Identification of causal genes for numerous monogenic neurological conditions has informed key aspects of disease mechanisms and facilitated research into critical proteins and molecular pathways, laying the groundwork for therapeutic interventions. However, the question remains: has this transformative trend reached its zenith? In this review, we suggest that despite significant strides in genome sequencing and advanced computational analyses, there is still ample room for methodological refinement. We anticipate further major genetic breakthroughs corresponding with the increased use of long-read genomes, variant calling software, AI tools, and data aggregation databases. Genetic progress has historically been driven by technological advancements from the commercial sector, which are developed in response to academic research needs, creating a continuous cycle of innovation and discovery. This review explores the potential of genomic technologies to address the challenges of neurogenetic disorders. By outlining both established and modern resources, we aim to emphasize the importance of genetic technologies as we enter an era poised for discoveries.
Collapse
Affiliation(s)
- Isaac R L Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jacquelyn Raposo
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
3
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 PMCID: PMC11433896 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
4
|
Duan DM, Cheng C, Huang YS, Chung AK, Chen PX, Chen YA, Hsu JS, Chen PL. Comparisons of performances of structural variants detection algorithms in solitary or combination strategy. PLoS One 2025; 20:e0314982. [PMID: 39913463 PMCID: PMC11801633 DOI: 10.1371/journal.pone.0314982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 11/19/2024] [Indexed: 02/09/2025] Open
Abstract
Structural variants (SVs) have been associated with changes in gene expression, which may contribute to alterations in phenotypes and disease development. However, the precise identification and characterization of SVs remain challenging. While long-read sequencing offers superior accuracy for SV detection, short-read sequencing remains essential due to practical and cost considerations, as well as the need to analyze existing short-read datasets. Numerous algorithms for short-read SV detection exist, but none are universally optimal, each having limitations for specific SV sizes and types. In this study, we evaluated the efficacy of six advanced SV detection algorithms, including the commercial software DRAGEN, using the GIAB v0.6 Tier 1 benchmark and HGSVC2 cell lines. We employed both individual and combination strategies, with systematic assessments of recall, precision, and F1 scores. Our results demonstrate that the union combination approach enhanced detection capabilities, surpassing single algorithms in identifying deletions and insertions, and delivered comparable recall and F1 scores to the commercial software DRAGEN. Interestingly, expanding the number of algorithms from three to five in the combination did not enhance performance, highlighting the efficiency of a well-chosen ensemble over a larger algorithmic pool.
Collapse
Affiliation(s)
- De-Min Duan
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
- Division of Cardiology, Department of Internal Medicine and The Cardiovascular Medical Center, Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Taipei, Taiwan
| | - Chinyi Cheng
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Yu-Shu Huang
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - An-ko Chung
- Department of Internal Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Pin-Xuan Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Yu-An Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Jacob Shujui Hsu
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|
5
|
Au-Yeung CCY, Cheung YT, Cheng JYT, Ip KWH, Lee SD, Yang VYT, Lau AYT, Lee CKC, Chong PKH, Lau KW, van Lunenburg JTJ, Zheng DFD, Ho BHM, Tik C, Ho KKK, Rajaby R, Au CH, Yu MHC, Sung WK. UniVar: A variant interpretation platform enhancing rare disease diagnosis through robust filtering and unified analysis of SNV, INDEL, CNV and SV. Comput Biol Med 2025; 185:109560. [PMID: 39700857 DOI: 10.1016/j.compbiomed.2024.109560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/24/2024] [Accepted: 12/08/2024] [Indexed: 12/21/2024]
Abstract
BACKGROUND Interpreting the pathogenicity of genetic variants associated with rare diseases is a laborious and time-consuming endeavour. To streamline the diagnostic process and lighten the burden of variant interpretation, it is crucial to automate variant annotation and prioritization. Unfortunately, currently available variant interpretation tools lack a unified and comprehensive workflow that can collectively assess the clinical significance of these types of variants together: small nucleotide variants (SNVs), small insertions/deletions (INDELs), copy number variants (CNVs) and structural variants (SVs). RESULTS The Unified Variant Interpretation Platform (UniVar) is a free web server tool that offers an automated and comprehensive workflow on annotation, filtering and prioritization for SNV, INDEL, CNV and SV collectively to identify disease-causing variants for rare diseases in one interface, ensuring accessibility for users even without programming expertise. To filter common CNVs/SVs, a diverse SV catalogue has been generated, that enables robust filtering of common SVs based on population allele frequency. Through benchmarking our SV catalogue, we showed that it is more complete and accurate than the state-of-the-art SV catalogues. Furthermore, to cope with those patients without detailed clinical information, we have developed a novel computational method that enables variant prioritization from gene panels. Our analysis shows that our approach could prioritize pathogenic variants as effective as using HPO terms assigned by clinicians, which adds value for cases without specific clinically assigned HPO terms. Lastly, through a practical case study of disease-causing compound heterozygous variants across SNV and SV, we demonstrated the uniqueness and effectiveness in variant interpretation of UniVar, edging over any existing interpretation tools. CONCLUSIONS UniVar is a unified and versatile platform that empowers researchers and clinicians to identify and interpret disease-causing variants in rare diseases efficiently through a single holistic interface and without a prerequisite for HPO terms. It is freely available without login and installation at https://univar.live/.
Collapse
Affiliation(s)
- Cherie C Y Au-Yeung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Yuen-Ting Cheung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Joshua Y T Cheng
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Ken W H Ip
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Sau-Dan Lee
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Victor Y T Yang
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Amy Y T Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Chit K C Lee
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Peter K H Chong
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - King Wai Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | | | - Damon F D Zheng
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Brian H M Ho
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Crystal Tik
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Kingsley K K Ho
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Ramesh Rajaby
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Shibuya Laboratory, Division of Medical Data Informatics, Human Genome Center, University of Tokyo, Japan
| | - Chun-Hang Au
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Mullin H C Yu
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Wing-Kin Sung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China; Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
6
|
Gadji M, Kengne-Ouafo JA, Tchouakui M, Wondji MJ, Mugenzi LMJ, Hearn J, Boyomo O, Wondji CS. Genome-wide association studies unveil major genetic loci driving insecticide resistance in Anopheles funestus in four eco-geographical settings across Cameroon. BMC Genomics 2024; 25:1202. [PMID: 39695386 PMCID: PMC11654272 DOI: 10.1186/s12864-024-11148-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 12/11/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Insecticide resistance is jeopardising malaria control efforts in Africa. Deciphering the evolutionary dynamics of mosquito populations country-wide is essential for designing effective and sustainable national and subnational tailored strategies to accelerate malaria elimination efforts. Here, we employed genome-wide association studies through pooled template sequencing to compare four eco-geographically different populations of the major vector, Anopheles funestus, across a South North transect in Cameroon, aiming to identify genomic signatures of adaptive responses to insecticides. RESULTS Our analysis revealed limited population structure within Northern and Central regions (FST<0.02), suggesting extensive gene flow, while populations from the Littoral/Coastal region exhibited more distinct genetic patterns (FST>0.049). Greater genetic differentiation was observed at known resistance-associated loci, resistance-to-pyrethroids 1 (rp1) (2R chromosome) and CYP9 (X chromosome), with varying signatures of positive selection across populations. Allelic variation between variants underscores the pervasive impact of selection pressures, with rp1 variants more prevalent in Central and Northern populations (FST>0.3), and the CYP9 associated variants more pronounced in the Littoral/Coastal region (FST =0.29). Evidence of selective sweeps was supported by negative Tajima's D and reduced genetic diversity in all populations, particularly in Central (Elende) and Northern (Tibati) regions. Genomic variant analysis identified novel missense mutations and signatures of complex genomic alterations such as duplications, deletions, transposable element (TE) insertions, and chromosomal inversions, all associated with selective sweeps. A 4.3 kb TE insertion was fixed in all populations with Njombe Littoral/Coastal population, showing higher frequency of CYP9K1 (G454A), a known resistance allele and TE upstream compared to elsewhere. CONCLUSION Our study uncovered regional variations in insecticide resistance candidate variants, emphasizing the need for a streamlined DNA-based diagnostic assay for genomic surveillance across Africa. These findings will contribute to the development of tailored resistance management strategies crucial for addressing the dynamic challenges of malaria control in Cameroon.
Collapse
Affiliation(s)
- Mahamat Gadji
- Centre for Research in Infectious Diseases (CRID), P.O. BOX 13591, Yaoundé, Cameroon.
- The University of Yaoundé 1, P.O BOX 812, Yaoundé, Cameroon.
| | - Jonas A Kengne-Ouafo
- Centre for Research in Infectious Diseases (CRID), P.O. BOX 13591, Yaoundé, Cameroon
| | - Magellan Tchouakui
- Centre for Research in Infectious Diseases (CRID), P.O. BOX 13591, Yaoundé, Cameroon
| | - Murielle J Wondji
- Centre for Research in Infectious Diseases (CRID), P.O. BOX 13591, Yaoundé, Cameroon
- Liverpool School of Tropical Medicine, Pembroke Place Liverpool L3 5QA UK, Liverpool, UK
| | - Leon M J Mugenzi
- Syngenta Crop Protection, Werk Stein, Schaffhauserstrasse, Stein, Switzerland
| | - Jack Hearn
- Centre for Epidemiology and Planetary Health, Scotland's Rural College (SRUC), RAVIC, 9 Inverness Campus, Inverness, UK
| | - Onana Boyomo
- The University of Yaoundé 1, P.O BOX 812, Yaoundé, Cameroon
| | - Charles S Wondji
- Centre for Research in Infectious Diseases (CRID), P.O. BOX 13591, Yaoundé, Cameroon.
- Liverpool School of Tropical Medicine, Pembroke Place Liverpool L3 5QA UK, Liverpool, UK.
| |
Collapse
|
7
|
Rajaby R, Sung WK. SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads. Nat Commun 2024; 15:10473. [PMID: 39622819 PMCID: PMC11612505 DOI: 10.1038/s41467-024-53087-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/30/2024] [Indexed: 12/06/2024] Open
Abstract
Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads-based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000 Genomes Project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs.
Collapse
Affiliation(s)
- Ramesh Rajaby
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
- A*STAR Genome Institute of Singapore, Singapore, Singapore
- Shibuya Lab, Division of Medical Data Informatics, Human Genome Center, University of Tokyo, Tokyo, Japan
| | - Wing-Kin Sung
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China.
- A*STAR Genome Institute of Singapore, Singapore, Singapore.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- School of Computing, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
8
|
Jugas R, Vitkova H. ProcaryaSV: structural variation detection pipeline for bacterial genomes using short-read sequencing. BMC Bioinformatics 2024; 25:233. [PMID: 38982375 PMCID: PMC11234778 DOI: 10.1186/s12859-024-05843-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/13/2024] [Indexed: 07/11/2024] Open
Abstract
BACKGROUND Structural variations play an important role in bacterial genomes. They can mediate genome adaptation quickly in response to the external environment and thus can also play a role in antibiotic resistance. The detection of structural variations in bacteria is challenging, and the recognition of even small rearrangements can be important. Even though most detection tools are aimed at and benchmarked on eukaryotic genomes, they can also be used on prokaryotic genomes. The key features of detection are the ability to detect small rearrangements and support haploid genomes. Because of the limiting performance of a single detection tool, combining the detection abilities of multiple tools can lead to more robust results. There are already available workflows for structural variation detection for long-reads technologies and for the detection of single-nucleotide variation and indels, both aimed at bacteria. Yet we are unaware of structural variations detection workflows for the short-reads sequencing platform. Motivated by this gap we created our workflow. Further, we were interested in increasing the detection performance and providing more robust results. RESULTS We developed an open-source bioinformatics pipeline, ProcaryaSV, for the detection of structural variations in bacterial isolates from paired-end short sequencing reads. Multiple tools, starting with quality control and trimming of sequencing data, alignment to the reference genome, and multiple structural variation detection tools, are integrated. All the partial results are then processed and merged with an in-house merging algorithm. Compared with a single detection approach, ProcaryaSV has improved detection performance and is a reproducible easy-to-use tool. CONCLUSIONS The ProcaryaSV pipeline provides an integrative approach to structural variation detection from paired-end next-generation sequencing of bacterial samples. It can be easily installed and used on Linux machines. It is publicly available on GitHub at https://github.com/robinjugas/ProcaryaSV .
Collapse
Affiliation(s)
- Robin Jugas
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Helena Vitkova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic.
| |
Collapse
|
9
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
10
|
Darian JC, Kundu R, Rajaby R, Sung WK. Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly. Nat Methods 2024; 21:574-583. [PMID: 38459383 DOI: 10.1038/s41592-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/30/2023] [Indexed: 03/10/2024]
Abstract
Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
Collapse
Affiliation(s)
| | - Ritu Kundu
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.
- Genome Institute of Singapore, Singapore, Singapore.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong, China.
| |
Collapse
|
11
|
Joe S, Park JL, Kim J, Kim S, Park JH, Yeo MK, Lee D, Yang JO, Kim SY. Comparison of structural variant callers for massive whole-genome sequence data. BMC Genomics 2024; 25:318. [PMID: 38549092 PMCID: PMC10976732 DOI: 10.1186/s12864-024-10239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.
Collapse
Grants
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
Collapse
Affiliation(s)
- Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jong-Lyul Park
- Aging Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Functional Genomics, University of Science and Technology (UST), 34113, Daejeon, Republic of Korea
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Sangok Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea
| | - Min-Kyung Yeo
- Department of Pathology, Chungnam National University School of Medicine, Daejeon, 35015, Republic of Korea
| | - Dongyoon Lee
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| | - Seon-Young Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| |
Collapse
|
12
|
Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res 2024; 34:300-309. [PMID: 38355307 PMCID: PMC10984387 DOI: 10.1101/gr.278267.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.
Collapse
Affiliation(s)
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|