1
|
Abstract
A single reference genome does not fully capture species diversity. By contrast, a pangenome incorporates multiple genomes to capture the entire set of nonredundant genes in a given species, along with its genome diversity. New sequencing technologies enable researchers to produce multiple high-quality genome sequences and catalog diverse genetic variations with better precision. Pangenomic studies have detected structural variants in plant genomes, dissected the genetic architecture of agronomic traits, and helped unravel molecular underpinnings and evolutionary origins of plant phenotypes. The pangenome concept has further evolved into a so-called super-pangenome that includes wild relatives within a genus or clade and shifted to graph-based reference systems. Nevertheless, building pangenomes and representing complex structural variants remain challenging in many crops. Standardized computing pipelines and common data structures are needed to compare and interpret pangenomes. The growing body of plant pangenomics data requires new algorithms, huge data storage capacity, and training to help researchers and breeders take advantage of newly discovered genes and genetic variants.
Collapse
Affiliation(s)
- Murukarthick Jayakodi
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA;
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, Texas, USA
| | - Hyeonah Shim
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| | - Martin Mascher
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany;
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| |
Collapse
|
2
|
Keskus AG, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Bi C, Walter A, Gibson M, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat Biotechnol 2025:10.1038/s41587-025-02618-8. [PMID: 40185952 DOI: 10.1038/s41587-025-02618-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 02/26/2025] [Indexed: 04/07/2025]
Abstract
For the detection of somatic structural variation (SV) in cancer genomes, long-read sequencing is advantageous over short-read sequencing with respect to mappability and variant phasing. However, most current long-read SV detection methods are not developed for the analysis of tumor genomes characterized by complex rearrangements and heterogeneity. Here, we present Severus, a breakpoint graph-based algorithm for somatic SV calling from long-read cancer sequencing. Severus works with matching normal samples, supports unbalanced cancer karyotypes, can characterize complex multibreak SV patterns and produces haplotype-specific calls. On a comprehensive multitechnology cell line panel, Severus consistently outperforms other long-read and short-read methods in terms of SV detection F1 score (harmonic mean of the precision and recall). We also illustrate that compared to long-read methods, short-read sequencing systematically misses certain classes of somatic SVs, such as insertions or clustered rearrangements. We apply Severus to several clinical cases of pediatric leukemia/lymphoma, revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A Lansdon
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Joshua Gardner
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Brandy McNulty
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Samuel Sacco
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Chengpeng Bi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Adam Walter
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Margaret Gibson
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Irina Pushel
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H Miga
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S Farooqi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Benedict Paten
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA.
| |
Collapse
|
3
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2025; 43:181-185. [PMID: 38519720 PMCID: PMC11825360 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
4
|
Zhang Z, Zhang J, Kang L, Qiu X, Xu S, Xu J, Guo Y, Niu Z, Niu B, Bi A, Zhao X, Xu D, Wang J, Yin C, Lu F. Structural variation discovery in wheat using PacBio high-fidelity sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:687-698. [PMID: 39239888 DOI: 10.1111/tpj.17011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 08/09/2024] [Accepted: 08/22/2024] [Indexed: 09/07/2024]
Abstract
Structural variations (SVs) pervade plant genomes and contribute substantially to the phenotypic diversity. However, most SVs were ineffectively assayed due to their complex nature and the limitations of early genomic technologies. By applying the PacBio high-fidelity (HiFi) sequencing for wheat genomes, we performed a comprehensive evaluation of mainstream long-read aligners and SV callers in SV detection. The results indicated that the accuracy of deletion discovery is markedly influenced by callers, accounting for 87.73% of the variance, whereas both aligners (38.25%) and callers (49.32%) contributed substantially to the accuracy variance for insertions. Among the aligners, Winnowmap2 and NGMLR excelled in detecting deletions and insertions, respectively. For SV callers, SVIM achieved the best performance. We demonstrated that combining the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the effect of sequencing depth on the accuracy of SV detection, revealing that low-coverage HiFi sequencing is sufficiently robust for high-quality SV discovery. This study thoroughly evaluated SV discovery approaches and established optimal workflows for investigating structural variations using low-coverage HiFi sequencing in the wheat genome, which will advance SV discovery and decipher the biological functions of SVs in wheat and many other plants.
Collapse
Affiliation(s)
- Zhiliang Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jijin Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lipeng Kang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebing Qiu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Song Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jun Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yafei Guo
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zelin Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Beirui Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Aoyue Bi
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebo Zhao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daxing Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Changbin Yin
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Fei Lu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
5
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 PMCID: PMC11322167 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
6
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
7
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
8
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|
9
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
10
|
Yu J, Chen N, Zheng Z, Gao M, Liang N, Wong KC. Chromothripsis detection with multiple myeloma patients based on deep graph learning. Bioinformatics 2023; 39:btad422. [PMID: 37399092 PMCID: PMC10343948 DOI: 10.1093/bioinformatics/btad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/20/2023] [Accepted: 06/30/2023] [Indexed: 07/05/2023] Open
Abstract
MOTIVATION Chromothripsis, associated with poor clinical outcomes, is prognostically vital in multiple myeloma. The catastrophic event is reported to be detectable prior to the progression of multiple myeloma. As a result, chromothripsis detection can contribute to risk estimation and early treatment guidelines for multiple myeloma patients. However, manual diagnosis remains the gold standard approach to detect chromothripsis events with the whole-genome sequencing technology to retrieve both copy number variation (CNV) and structural variation data. Meanwhile, CNV data are much easier to obtain than structural variation data. Hence, in order to reduce the reliance on human experts' efforts and structural variation data extraction, it is necessary to establish a reliable and accurate chromothripsis detection method based on CNV data. RESULTS To address those issues, we propose a method to detect chromothripsis solely based on CNV data. With the help of structure learning, the intrinsic relationship-directed acyclic graph of CNV features is inferred to derive a CNV embedding graph (i.e. CNV-DAG). Subsequently, a neural network based on Graph Transformer, local feature extraction, and non-linear feature interaction, is proposed with the embedding graph as the input to distinguish whether the chromothripsis event occurs. Ablation experiments, clustering, and feature importance analysis are also conducted to enable the proposed model to be explained by capturing mechanistic insights. AVAILABILITY AND IMPLEMENTATION The source code and data are freely available at https://github.com/luvyfdawnYu/CNV_chromothripsis.
Collapse
Affiliation(s)
- Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Ming Gao
- School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian 116025, China
| | - Ning Liang
- University of Michigan, Ann Arbor, MI 48105, United States
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen 518057, China
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| |
Collapse
|
11
|
Sikic M. Facilitating genome structural variation analysis. Nat Methods 2023; 20:491-492. [PMID: 36959321 DOI: 10.1038/s41592-023-01767-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Affiliation(s)
- Mile Sikic
- Laboratory of AI in Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
| |
Collapse
|