1
|
Li Q, Keskus AG, Wagner J, Izydorczyk MB, Timp W, Sedlazeck FJ, Klein AP, Zook JM, Kolmogorov M, Schatz MC. Unraveling the hidden complexity of cancer through long-read sequencing. Genome Res 2025; 35:599-620. [PMID: 40113261 PMCID: PMC12047254 DOI: 10.1101/gr.280041.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Michal B Izydorczyk
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Alison P Klein
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| |
Collapse
|
2
|
Keskus AG, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Bi C, Walter A, Gibson M, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat Biotechnol 2025:10.1038/s41587-025-02618-8. [PMID: 40185952 DOI: 10.1038/s41587-025-02618-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 02/26/2025] [Indexed: 04/07/2025]
Abstract
For the detection of somatic structural variation (SV) in cancer genomes, long-read sequencing is advantageous over short-read sequencing with respect to mappability and variant phasing. However, most current long-read SV detection methods are not developed for the analysis of tumor genomes characterized by complex rearrangements and heterogeneity. Here, we present Severus, a breakpoint graph-based algorithm for somatic SV calling from long-read cancer sequencing. Severus works with matching normal samples, supports unbalanced cancer karyotypes, can characterize complex multibreak SV patterns and produces haplotype-specific calls. On a comprehensive multitechnology cell line panel, Severus consistently outperforms other long-read and short-read methods in terms of SV detection F1 score (harmonic mean of the precision and recall). We also illustrate that compared to long-read methods, short-read sequencing systematically misses certain classes of somatic SVs, such as insertions or clustered rearrangements. We apply Severus to several clinical cases of pediatric leukemia/lymphoma, revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A Lansdon
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Joshua Gardner
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Brandy McNulty
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Samuel Sacco
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Chengpeng Bi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Adam Walter
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Margaret Gibson
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Irina Pushel
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H Miga
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S Farooqi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Benedict Paten
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA.
| |
Collapse
|
3
|
Han H, Lee HH, Kim MG, Shin YS, Chung JS, Kim J. Genome assembly resources of genitourinary cancers for chromosomal aberration at the single nucleotide level. Sci Data 2025; 12:550. [PMID: 40169664 PMCID: PMC11962096 DOI: 10.1038/s41597-025-04801-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/11/2025] [Indexed: 04/03/2025] Open
Abstract
Traditionally, the evolutionary perspective of cancer has been understood as gradual alterations in passenger/driver genes that lead to branching phylogeny. However, in cases of prostate adenocarcinoma and kidney renal cell carcinoma, macroevolutionary landmarks like chromoplexy and chromothripsis are frequently observed. Unfortunately, short-read sequencing techniques often miss these significant macroevolutionary changes, which involve multiple translocations and deletions at the chromosomal level. To resolve such genomic dark matters, we provided high-fidelity long-read sequencing data (78-92 Gb of ~Q30 reads) of six genitourinary tumour cell lines (one benign kidney tumour and two kidney and three prostate cancers). Based on these data, we obtained 12 high-quality, partially phased genome assemblies (Contig N50 1.85-29.01 Mb; longest contig 2.02-171.62 Mb), graph-based pan-genome variant sets (11.57 M variants including 60 K structural variants), and 5-methylcytosine sites (14.68%-27.05% of the CpG sites). We also identified several severe chromosome aberration events, which would result from chromosome break and fusion events. Our cancer genome assemblies will provide unprecedented resolution to understand cancer genome instability and chromosomal aberration.
Collapse
Affiliation(s)
- Hyunho Han
- Department of Urology, Urological Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.
| | - Hyung Ho Lee
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Min Gyu Kim
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Yoo Sub Shin
- Department of Urology, Urological Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jin Soo Chung
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea.
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Korea.
| |
Collapse
|
4
|
Su X, Lin Q, Liu B, Zhou C, Lu L, Lin Z, Si J, Ding Y, Duan S. The promising role of nanopore sequencing in cancer diagnostics and treatment. CELL INSIGHT 2025; 4:100229. [PMID: 39995512 PMCID: PMC11849079 DOI: 10.1016/j.cellin.2025.100229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 01/13/2025] [Accepted: 01/14/2025] [Indexed: 02/26/2025]
Abstract
Cancer arises from genetic alterations that impact both the genome and transcriptome. The utilization of nanopore sequencing offers a powerful means of detecting these alterations due to its unique capacity for long single-molecule sequencing. In the context of DNA analysis, nanopore sequencing excels in identifying structural variations (SVs), copy number variations (CNVs), gene fusions within SVs, and mutations in specific genes, including those involving DNA modifications and DNA adducts. In the field of RNA research, nanopore sequencing proves invaluable in discerning differentially expressed transcripts, uncovering novel elements linked to transcriptional regulation, and identifying alternative splicing events and RNA modifications at the single-molecule level. Furthermore, nanopore sequencing extends its reach to detecting microorganisms, encompassing bacteria and viruses, that are intricately associated with tumorigenesis and the development of cancer. Consequently, the application prospects of nanopore sequencing in tumor diagnosis and personalized treatment are expansive, encompassing tasks such as tumor identification and classification, the tailoring of treatment strategies, and the screening of prospective patients. In essence, this technology stands poised to unearth novel mechanisms underlying tumorigenesis while providing dependable support for the diagnosis and treatment of cancer.
Collapse
Affiliation(s)
- Xinming Su
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Qingyuan Lin
- The Second Clinical Medical College, Zhejiang Chinese Medicine University BinJiang College, Hangzhou 310053, Zhejiang, China
| | - Bin Liu
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Chuntao Zhou
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Liuyi Lu
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Zihao Lin
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Jiahua Si
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Yuemin Ding
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Institute of Translational Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Shiwei Duan
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Institute of Translational Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| |
Collapse
|
5
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2025; 43:181-185. [PMID: 38519720 PMCID: PMC11825360 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
6
|
Ye F, Zhu J, Zhang X, Zhang J, Xie Z, Yang T, Han Y, Yang X, Ren Z, Ni M. Characteristics and filtering of low-frequency artificial short deletion variations based on nanopore sequencing. Gigascience 2025; 14:giaf018. [PMID: 40117177 PMCID: PMC11927395 DOI: 10.1093/gigascience/giaf018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 12/20/2024] [Accepted: 02/09/2025] [Indexed: 03/23/2025] Open
Abstract
BACKGROUND Nanopore sequencing is characterized by high portability and long reads, albeit accompanied by systematic errors causing short deletions. Few tools can filter low-frequency artificial deletions, especially in single samples. RESULTS To solve this problem, we first synthesized or purchased 17 DNA/RNA standards for nanopore sequencing with R9 and R10 flowcells to obtain benchmarking datasets. False-positive (FP) deletions were prevalent (75.86%-96.26%), while the majority (62.07%-79.68%) were located in homopolymeric regions. The 10-mer base-quality scores (Q scores) and sequencing speeds flanking the FP homopolymeric deletions marginally differed from the true-positive (TP) deletions. We thus investigated the raw current signals after normalizing them by length. We found more significant differences in current signals between the reads with and without FP deletions. Indexes including the MRPP A (Multiple Response Permutation Procedure, statistic A), the accumulative difference of normalized current signals, and the Q score were tested for the power of distinguishing between FP and TP deletions. MRPP A outperformed the other indexes in homopolymeric regions and achieved the highest accuracy of 76.73% for challenging 1-base homopolymeric deletions. When sequencing depth was low, the Q score performed better than MRPP A. We developed Delter (Deletion filter) to filter low-frequency FP deletions of nanopore sequencing in single samples, which removed 60.98% to 100% of artificial homopolymeric deletions in real samples. CONCLUSIONS Low-frequency artificial short deletion variations, especially the most challenging homopolymeric deletions, could be effectively filtered by Delter using normalized current signals or Q scores according to the employed sequencing strategies.
Collapse
Affiliation(s)
- Fuqiang Ye
- Huadong Research Institute for Medicine and Biotechniques, Nanjing 210002, People’s Republic of China
| | - Juanjuan Zhu
- School of Life Science and Technology, China Pharmaceutical University, Nanjing 211198, People’s Republic of China
| | - Xiaomin Zhang
- Department of Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, Beijing 100850, People’s Republic of China
| | - Jiarong Zhang
- Department of Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, Beijing 100850, People’s Republic of China
- School of Forensic Medicine, Shanxi Medical University, Jinzhong 030600, People’s Republic of China
| | - Zihan Xie
- Department of Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, Beijing 100850, People’s Republic of China
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, People’s Republic of China
| | - Tingting Yang
- Department of Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, Beijing 100850, People’s Republic of China
- School of Forensic Medicine, Shanxi Medical University, Jinzhong 030600, People’s Republic of China
| | - Yifang Han
- Huadong Research Institute for Medicine and Biotechniques, Nanjing 210002, People’s Republic of China
| | - Xiaohong Yang
- Huadong Research Institute for Medicine and Biotechniques, Nanjing 210002, People’s Republic of China
| | - Zilin Ren
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control, Changchun 130122, People’s Republic of China
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, People’s Republic of China
| | - Ming Ni
- Department of Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, Beijing 100850, People’s Republic of China
| |
Collapse
|
7
|
Kechin A, Koryukov M, Mikheeva R, Filipenko M. Homologous recombination deficiency (HRD) diagnostics: underlying mechanisms and new perspectives. Cancer Metastasis Rev 2024; 44:19. [PMID: 39724448 DOI: 10.1007/s10555-024-10238-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 12/20/2024] [Indexed: 12/28/2024]
Abstract
Homologous recombination deficiency (HRD) is considered a universal and effective sign of a tumor's sensitivity to poly(ADP-ribose) polymerase (PARP) inhibitors. HRD diagnostics have undergone several stages of transformations: from detection of point mutations in HR-related genes and large regions with loss of heterozygosity detected using single-nucleotide polymorphism arrays to whole-genome signatures of single-nucleotide variants, large genomic rearrangements (LGRs), and copy number alterations. All these methods have their own advantages and limitations. HRD tests, based on signatures of LGRs and copy number alterations, show in hindsight that some progenitor cells have possessed HRD status but not the current state of the genome. The aim of this review was to compare different methods of HRD detection and mechanisms of formation of HRD-specific LGRs. In the last several years, new data appeared implying a crucial role of proteins BRCA1 and BRCA2 in the resolution of stalled replication forks that may be associated with at least some of LGRs observed in HRD-positive tumors. Reviewing current knowledge on these mechanisms, distributions of different LGR types, and limitations of sequencing technologies and algorithms of data analysis, we offer some new perspectives on HRD diagnostics. We hope that this review will help to accelerate the development of new diagnostic approaches in this important field of molecular oncology.
Collapse
Affiliation(s)
- Andrey Kechin
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, 630090, Russia.
- Novosibirsk State University, Novosibirsk, 630090, Russia.
| | - Maksim Koryukov
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, 630090, Russia
- Novosibirsk State University, Novosibirsk, 630090, Russia
| | - Regina Mikheeva
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, 630090, Russia
- Novosibirsk State University, Novosibirsk, 630090, Russia
| | - Maksim Filipenko
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, 630090, Russia
| |
Collapse
|
8
|
Ren L, Shi L, Zheng Y. Reference Materials for Improving Reliability of Multiomics Profiling. PHENOMICS (CHAM, SWITZERLAND) 2024; 4:487-521. [PMID: 39723231 PMCID: PMC11666855 DOI: 10.1007/s43657-023-00153-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 12/28/2024]
Abstract
High-throughput technologies for multiomics or molecular phenomics profiling have been extensively adopted in biomedical research and clinical applications, offering a more comprehensive understanding of biological processes and diseases. Omics reference materials play a pivotal role in ensuring the accuracy, reliability, and comparability of laboratory measurements and analyses. However, the current application of omics reference materials has revealed several issues, including inappropriate selection and underutilization, leading to inconsistencies across laboratories. This review aims to address these concerns by emphasizing the importance of well-characterized reference materials at each level of omics, encompassing (epi-)genomics, transcriptomics, proteomics, and metabolomics. By summarizing their characteristics, advantages, and limitations along with appropriate performance metrics pertinent to study purposes, we provide an overview of how omics reference materials can enhance data quality and data integration, thus fostering robust scientific investigations with omics technologies.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
- Shanghai Cancer Center, Fudan University, Shanghai, 200032 China
- International Human Phenome Institutes, Shanghai, 200438 China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
| |
Collapse
|
9
|
Nepal C, Chen W, Chen Z, Wrobel JA, Xie L, Liao W, Xiao C, Farmer A, Moos M, Jones W, Chen X, Wang C. Epigenomic, transcriptomic and proteomic characterizations of reference samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.612110. [PMID: 39314461 PMCID: PMC11419083 DOI: 10.1101/2024.09.09.612110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
A variety of newly developed next-generation sequencing technologies are making their way rapidly into the research and clinical applications, for which accuracy and cross-lab reproducibility are critical, and reference standards are much needed. Our previous multicenter studies under the SEQC-2 umbrella using a breast cancer cell line with paired B-cell line have produced a large amount of different genomic data including whole genome sequencing (Illumina, PacBio, Nanopore), HiC, and scRNA-seq with detailed analyses on somatic mutations, single-nucleotide variations (SNVs), and structural variations (SVs). However, there is still a lack of well-characterized reference materials which include epigenomic and proteomic data. Here we further performed ATAC-seq, Methyl-seq, RNA-seq, and proteomic analyses and provided a comprehensive catalog of the epigenomic landscape, which overlapped with the transcriptomes and proteomes for the two cell lines. We identified >7,700 peptide isoforms, where the majority (95%) of the genes had a single peptide isoform. Protein expression of the transcripts overlapping CGIs were much higher than the protein expression of the non-CGI transcripts in both cell lines. We further demonstrated the evidence that certain SNVs were incorporated into mutated peptides. We observed that open chromatin regions had low methylation which were largely regulated by CG density, where CG-rich regions had more accessible chromatin, low methylation, and higher gene and protein expression. The CG-poor regions had higher repressive epigenetic regulations (higher DNA methylation) and less open chromatin, resulting in a cell line specific methylation and gene expression patterns. Our studies provide well-defined reference materials consisting of two cell lines with genomic, epigenomic, transcriptomic, scRNA-seq and proteomic characterizations which can serve as standards for validating and benchmarking not only on various omics assays, but also on bioinformatics methods. It will be a valuable resource for both research and clinical communities.
Collapse
Affiliation(s)
- Chirag Nepal
- Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - Wanqiu Chen
- Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - Zhong Chen
- Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - John A. Wrobel
- Dept. of Biochemistry and Biophysics and Proteomic Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| | - Ling Xie
- Dept. of Biochemistry and Biophysics and Proteomic Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| | - Wenjing Liao
- Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894, USA
| | | | - Malcolm Moos
- Center for Biologics Evaluation and Research & Division of Cellular and Gene Therapies, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Xian Chen
- Dept. of Biochemistry and Biophysics and Proteomic Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| | - Charles Wang
- Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| |
Collapse
|
10
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 PMCID: PMC11322167 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
11
|
Pei Y, Tanguy M, Giess A, Dixit A, Wilson LC, Gibbons RJ, Twigg SRF, Elgar G, Wilkie AOM. A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark. Genes (Basel) 2024; 15:925. [PMID: 39062704 PMCID: PMC11276380 DOI: 10.3390/genes15070925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/03/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.
Collapse
Affiliation(s)
- Yang Pei
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| | - Melanie Tanguy
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Adam Giess
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Abhijit Dixit
- Clinical Genetics Service, Nottingham University Hospitals NHS Foundation Trust, City Hospital, Nottingham NG5 1PB, UK
| | - Louise C. Wilson
- North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Foundation Trust, Great Ormond Street Hospital, London WC1N 3JH, UK
| | - Richard J. Gibbons
- MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK
| | - Stephen R. F. Twigg
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| | - Greg Elgar
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Andrew O. M. Wilkie
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| |
Collapse
|
12
|
Masood D, Ren L, Nguyen C, Brundu FG, Zheng L, Zhao Y, Jaeger E, Li Y, Cha SW, Halpern A, Truong S, Virata M, Yan C, Chen Q, Pang A, Alberto R, Xiao C, Yang Z, Chen W, Wang C, Cross F, Catreux S, Shi L, Beaver JA, Xiao W, Meerzaman DM. Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome. Genome Biol 2024; 25:163. [PMID: 38902799 PMCID: PMC11188507 DOI: 10.1186/s13059-024-03294-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/29/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Copy number variation (CNV) is a key genetic characteristic for cancer diagnostics and can be used as a biomarker for the selection of therapeutic treatments. Using data sets established in our previous study, we benchmark the performance of cancer CNV calling by six most recent and commonly used software tools on their detection accuracy, sensitivity, and reproducibility. In comparison to other orthogonal methods, such as microarray and Bionano, we also explore the consistency of CNV calling across different technologies on a challenging genome. RESULTS While consistent results are observed for copy gain, loss, and loss of heterozygosity (LOH) calls across sequencing centers, CNV callers, and different technologies, variation of CNV calls are mostly affected by the determination of genome ploidy. Using consensus results from six CNV callers and confirmation from three orthogonal methods, we establish a high confident CNV call set for the reference cancer cell line (HCC1395). CONCLUSIONS NGS technologies and current bioinformatics tools can offer reliable results for detection of copy gain, loss, and LOH. However, when working with a hyper-diploid genome, some software tools can call excessive copy gain or loss due to inaccurate assessment of genome ploidy. With performance matrices on various experimental conditions, this study raises awareness within the cancer research community for the selection of sequencing platforms, sample preparation, sequencing coverage, and the choice of CNV detection tools.
Collapse
Affiliation(s)
- Daniall Masood
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | | | - Lily Zheng
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Yong Li
- Illumina Inc., San Diego, CA, USA
| | | | | | | | | | - Chunhua Yan
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Qingrong Chen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Andy Pang
- Bionano Genomics, San Diego, CA, 20892, USA
| | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Librarssy of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Zhaowei Yang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Frank Cross
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Julia A Beaver
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
- Oncology Center of Excellence, Food and Drug Administration, Silver Spring, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA.
| | - Daoud M Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA.
| |
Collapse
|
13
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. Bioinformatics 2024; 40:btae340. [PMID: 38796686 PMCID: PMC11153836 DOI: 10.1093/bioinformatics/btae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/04/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024] Open
Abstract
SUMMARY The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences. AVAILABILITY AND IMPLEMENTATION SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
| | - Katherine S Pollard
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
- Chan Zuckerberg Biohub, San Francisco, CA 94158, United States
| |
Collapse
|
14
|
Zhao Q, Yang S, Hao S, Chen Z, Tang L, Wu Z, Wu J, Xu M, Ma Z, Zhou L, Xu J, Qin Q. Identification of transcriptionally-active human papillomavirus integrants through nanopore sequencing reveals viable targets for gene therapy against cervical cancer. J Med Virol 2024; 96:e29769. [PMID: 38932482 DOI: 10.1002/jmv.29769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024]
Abstract
Integration of the human papillomavirus (HPV) genome into the cellular genome is a key event that leads to constitutive expression of viral oncoprotein E6/E7 and drives the progression of cervical cancer. However, HPV integration patterns differ on a case-by-case basis among related malignancies. Next-generation sequencing technologies still face challenges for interrogating HPV integration sites. In this study, utilizing Nanopore long-read sequencing, we identified 452 and 108 potential integration sites from the cervical cancer cell lines (CaSki and HeLa) and five tissue samples, respectively. Based on long Nanopore chimeric reads, we were able to analyze the methylation status of the HPV long control region (LCR), which controls oncogene E6/E7 expression, and to identify transcriptionally-active integrants among the numerous integrants. As a proof of concept, we identified an active HPV integrant in between RUNX2 and CLIC5 on chromosome 6 in the CaSki cell line, which was supported by ATAC-seq, H3K27Ac ChIP-seq, and RNA-seq analysis. Knockout of the active HPV integrant, by the CRISPR/Cas9 system, dramatically crippled cell proliferation and induced cell senescence. In conclusion, identifying transcriptionally-active HPV integrants with Nanopore sequencing can provide viable targets for gene therapy against HPV-associated cancers.
Collapse
Affiliation(s)
- Qianqian Zhao
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
- Computational Systems Biology Laboratory, Department of Bioinformatics, Shantou University Medical College, Shantou, China
| | - Shuaibing Yang
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Shijia Hao
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Zejia Chen
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Lihua Tang
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Zhaoting Wu
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Jiaxin Wu
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Mingqian Xu
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Zebiao Ma
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Li Zhou
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Jianzhen Xu
- Computational Systems Biology Laboratory, Department of Bioinformatics, Shantou University Medical College, Shantou, China
| | - Qingsong Qin
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
- International Science and Technology Collaboration Center for Emerging Infectious Diseases, Shantou University Medical College, Shantou, China
| |
Collapse
|
15
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
16
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
17
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
18
|
Wang Y, Chen Y, Gao J, Xie H, Guo Y, Yang J, Liu J, Chen Z, Li Q, Li M, Ren J, Wen L, Tang F. Mapping crossover events of mouse meiotic recombination by restriction fragment ligation-based Refresh-seq. Cell Discov 2024; 10:26. [PMID: 38443370 PMCID: PMC10915157 DOI: 10.1038/s41421-023-00638-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 12/11/2023] [Indexed: 03/07/2024] Open
Abstract
Single-cell whole-genome sequencing methods have undergone great improvements over the past decade. However, allele dropout, which means the inability to detect both alleles simultaneously in an individual diploid cell, largely restricts the application of these methods particularly for medical applications. Here, we develop a new single-cell whole-genome sequencing method based on third-generation sequencing (TGS) platform named Refresh-seq (restriction fragment ligation-based genome amplification and TGS). It is based on restriction endonuclease cutting and ligation strategy in which two alleles in an individual cell can be cut into equal fragments and tend to be amplified simultaneously. As a new single-cell long-read genome sequencing method, Refresh-seq features much lower allele dropout rate compared with SMOOTH-seq. Furthermore, we apply Refresh-seq to 688 sperm cells and 272 female haploid cells (secondary polar bodies and parthenogenetic oocytes) from F1 hybrid mice. We acquire high-resolution genetic map of mouse meiosis recombination at low sequencing depth and reveal the sexual dimorphism in meiotic crossovers. We also phase the structure variations (deletions and insertions) in sperm cells and female haploid cells with high precision. Refresh-seq shows great performance in screening aneuploid sperm cells and oocytes due to the low allele dropout rate and has great potential for medical applications such as preimplantation genetic diagnosis.
Collapse
Affiliation(s)
- Yan Wang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yijun Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Junpeng Gao
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Emergency Center, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jingwei Yang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Qingqing Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Mengyao Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jie Ren
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- Changping Laboratory, Beijing, China.
| |
Collapse
|
19
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
20
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565556. [PMID: 37961123 PMCID: PMC10635135 DOI: 10.1101/2023.11.03.565556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Computationally editing genome sequences is a common bioinformatics task, but current approaches have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing in silico mutagenesis. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
21
|
Geoffroy V, Lamouche JB, Guignard T, Nicaise S, Kress A, Scheidecker S, Le Béchec A, Muller J. The AnnotSV webserver in 2023: updated visualization and ranking. Nucleic Acids Res 2023:7175348. [PMID: 37216590 DOI: 10.1093/nar/gkad426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 04/20/2023] [Accepted: 05/09/2023] [Indexed: 05/24/2023] Open
Abstract
Much of the human genetics variant repertoire is composed of single nucleotide variants (SNV) and small insertion/deletions (indel) but structural variants (SV) remain a major part of our modified DNA. SV detection has often been a complex question to answer either because of the necessity to use different technologies (array CGH, SNP array, Karyotype, Optical Genome Mapping…) to detect each category of SV or to get an appropriate resolution (Whole Genome Sequencing). Thanks to the deluge of pangenomic analysis, Human geneticists are accumulating SV and their interpretation remains time consuming and challenging. The AnnotSV webserver (https://www.lbgi.fr/AnnotSV/) aims at being an efficient tool to (i) annotate and interpret SV potential pathogenicity in the context of human diseases, (ii) recognize potential false positive variants from all the SV identified and (iii) visualize the patient variants repertoire. The most recent developments in the AnnotSV webserver are: (i) updated annotations sources and ranking, (ii) three novel output formats to allow diverse utilization (analysis, pipelines), as well as (iii) two novel user interfaces including an interactive circos view.
Collapse
Affiliation(s)
- Véronique Geoffroy
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
| | - Jean-Baptiste Lamouche
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | | | - Samuel Nicaise
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics, ICube, UMR 7357, University of Strasbourg, CNRS, FMTS, Strasbourg, France
| | - Sophie Scheidecker
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Antony Le Béchec
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jean Muller
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| |
Collapse
|
22
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|