1
|
Sun Y, Zhao X, Fan X, Wang M, Li C, Liu Y, Wu P, Yan Q, Sun L. Assessing the impact of sequencing platforms and analytical pipelines on whole-exome sequencing. Front Genet 2024; 15:1334075. [PMID: 38818042 PMCID: PMC11137314 DOI: 10.3389/fgene.2024.1334075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/30/2024] [Indexed: 06/01/2024] Open
Affiliation(s)
- Yanping Sun
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Xiaochao Zhao
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Xue Fan
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Miao Wang
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Chaoyang Li
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Yongfeng Liu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Ping Wu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- GeneMind Biosciences Company Limited, Shenzhen, China
| |
Collapse
|
2
|
Yu L, Zhang Y, Wang D, Li L, Zhang R, Li J. Harmonizing tumor mutational burden analysis: Insights from a multicenter study using in silico reference data sets in clinical whole-exome sequencing (WES). Am J Clin Pathol 2024:aqae056. [PMID: 38733635 DOI: 10.1093/ajcp/aqae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/13/2024] [Indexed: 05/13/2024] Open
Abstract
OBJECTIVES Tumor mutational burden (TMB) is a significant biomarker for predicting immune checkpoint inhibitor response, but the clinical performance of whole-exome sequencing (WES)-based TMB estimation has received less attention compared to panel-based methods. This study aimed to assess the reliability and comparability of WES-based TMB analysis among laboratories under routine testing conditions. METHODS A multicenter study was conducted involving 24 laboratories in China using in silico reference data sets. The accuracy and comparability of TMB estimation were evaluated using matched tumor-normal data sets. Factors such as accuracy of variant calls, limit of detection (LOD) of WES test, size of regions of interest (ROIs) used for TMB calculation, and TMB cutoff points were analyzed. RESULTS The laboratories consistently underestimated the expected TMB scores in matched tumor-normal samples, with only 50% falling within the ±30% TMB interval. Samples with low TMB score (<2.5) received the consensus interpretation. Accuracy of variant calls, LOD of the WES test, ROI, and TMB cutoff points were important factors causing interlaboratory deviations. CONCLUSIONS This study highlights real-world challenges in WES-based TMB analysis that need to be improved and optimized. This research will aid in the selection of more reasonable analytical procedures to minimize potential methodologic biases in estimating TMB in clinical exome sequencing tests. Harmonizing TMB estimation in clinical testing conditions is crucial for accurately evaluating patients' response to immunotherapy.
Collapse
Affiliation(s)
- Lijia Yu
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Lin Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| |
Collapse
|
3
|
Sergi A, Beltrame L, Marchini S, Masseroli M. Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking. BMC Bioinformatics 2024; 25:180. [PMID: 38720249 PMCID: PMC11077792 DOI: 10.1186/s12859-024-05793-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.
Collapse
Affiliation(s)
- Aldo Sergi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy.
| | - Luca Beltrame
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy
| | - Sergio Marchini
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| |
Collapse
|
4
|
Laguna JC, Pastor B, Nalda I, Hijazo-Pechero S, Teixido C, Potrony M, Puig-Butillé JA, Mezquita L. Incidental pathogenic germline alterations detected through liquid biopsy in patients with solid tumors: prevalence, clinical utility and implications. Br J Cancer 2024; 130:1420-1431. [PMID: 38532104 PMCID: PMC11059286 DOI: 10.1038/s41416-024-02607-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/14/2024] [Accepted: 01/25/2024] [Indexed: 03/28/2024] Open
Abstract
Liquid biopsy, a minimally invasive approach for detecting tumor biomarkers in blood, has emerged as a leading-edge technique in cancer precision medicine. New evidence has shown that liquid biopsies can incidentally detect pathogenic germline variants (PGVs) associated with cancer predisposition, including in patients with a cancer for which genetic testing is not recommended. The ability to detect these incidental PGV in cancer patients through liquid biopsy raises important questions regarding the management of this information and its clinical implications. This incidental identification of PGVs raises concerns about cancer predisposition and the potential impact on patient management, not only in terms of providing access to treatment based on the tumor molecular profiling, but also the management of revealing genetic predisposition in patients and families. Understanding how to interpret this information is essential to ensure proper decision-making and to optimize cancer treatment and prevention strategies. In this review we provide a comprehensive summary of current evidence of incidental PGVs in cancer predisposition genes identified by liquid biopsy in patients with cancer. We critically review the methodological considerations of liquid biopsy as a tool for germline diagnosis, clinical utility and potential implications for cancer prevention, treatment, and research.
Collapse
Affiliation(s)
- Juan Carlos Laguna
- Medical Oncology Department, Hospital Clinic of Barcelona, Barcelona, Spain
- Laboratory of Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Belén Pastor
- Medical Oncology Department, Hospital Clinic of Barcelona, Barcelona, Spain
| | - Irene Nalda
- Medical Oncology Department, Hospital Clinic of Barcelona, Barcelona, Spain
- Laboratory of Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Sara Hijazo-Pechero
- Preclinical and Experimental Research in Thoracic Tumors (PRETT), Oncobell, Bellvitge Biomedical Research Institute (IDIBELL), l'Hospitalet de Llobregat, Barcelona, Spain
| | - Cristina Teixido
- Laboratory of Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
- Department of Medicine, University of Barcelona, Barcelona, Spain
- Department of Pathology, Hospital Clinic of Barcelona, Barcelona, Spain
| | - Miriam Potrony
- Biochemistry and Molecular Genetics Department, Hospital Clínic of Barcelona, IDIBAPS, Barcelona, Spain
- CIBER of Rare Diseases (CIBERER), Barcelona, Spain
| | - Joan Antón Puig-Butillé
- CIBER of Rare Diseases (CIBERER), Barcelona, Spain
- Molecular Biology CORE, Hospital Clínic of Barcelona, IDIBAPS, Barcelona, Spain
| | - Laura Mezquita
- Medical Oncology Department, Hospital Clinic of Barcelona, Barcelona, Spain.
- Laboratory of Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain.
- Department of Medicine, University of Barcelona, Barcelona, Spain.
| |
Collapse
|
5
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
6
|
Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics 2024; 40:btae066. [PMID: 38366603 PMCID: PMC10898327 DOI: 10.1093/bioinformatics/btae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 01/16/2024] [Accepted: 02/14/2024] [Indexed: 02/18/2024] Open
Abstract
MOTIVATION Genomic variations comprise a spectrum of alterations, ranging from single nucleotide polymorphisms (SNPs) to large-scale structural variants (SVs), which play crucial roles in bacterial evolution and species diversification. Accurately identifying SNPs and SVs is beneficial for subsequent evolutionary and epidemiological studies. This study presents VariantDetective (VD), a novel, user-friendly, and all-in-one pipeline combining SNP and SV calling to generate consensus genomic variants using multiple tools. RESULTS The VD pipeline accepts various file types as input to initiate SNP and/or SV calling, and benchmarking results demonstrate VD's robustness and high accuracy across multiple tested datasets when compared to existing variant calling approaches. AVAILABILITY AND IMPLEMENTATION The source code, test data, and relevant information for VD are freely accessible at https://github.com/OLF-Bioinformatics/VariantDetective under the MIT License.
Collapse
Affiliation(s)
- Philippe Charron
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| | - Mingsong Kang
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| |
Collapse
|
7
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
8
|
Abdelwahab O, Belzile F, Torkamaneh D. Performance analysis of conventional and AI-based variant callers using short and long reads. BMC Bioinformatics 2023; 24:472. [PMID: 38097928 PMCID: PMC10720095 DOI: 10.1186/s12859-023-05596-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. RESULTS In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. CONCLUSION This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
Collapse
Affiliation(s)
- Omar Abdelwahab
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
- Institut intelligence et données (IID), Université Laval, Québec, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada.
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada.
- Institut intelligence et données (IID), Université Laval, Québec, Canada.
| |
Collapse
|
9
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
10
|
Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023; 13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open
Abstract
Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Xudong Xiang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Bowen Lu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Dongyang Song
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Jie Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Dan Pu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
11
|
Menzel M, Ossowski S, Kral S, Metzger P, Horak P, Marienfeld R, Boerries M, Wolter S, Ball M, Neumann O, Armeanu-Ebinger S, Schroeder C, Matysiak U, Goldschmid H, Schipperges V, Fürstberger A, Allgäuer M, Eberhardt T, Niewöhner J, Blaumeiser A, Ploeger C, Haack TB, Tay TKY, Kelemen O, Pauli T, Kirchner M, Kluck K, Ott A, Renner M, Admard J, Gschwind A, Lassmann S, Kestler H, Fend F, Illert AL, Werner M, Möller P, Seufferlein TTW, Malek N, Schirmacher P, Fröhling S, Kazdal D, Budczies J, Stenzinger A. Multicentric pilot study to standardize clinical whole exome sequencing (WES) for cancer patients. NPJ Precis Oncol 2023; 7:106. [PMID: 37864096 PMCID: PMC10589320 DOI: 10.1038/s41698-023-00457-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 09/26/2023] [Indexed: 10/22/2023] Open
Abstract
A growing number of druggable targets and national initiatives for precision oncology necessitate broad genomic profiling for many cancer patients. Whole exome sequencing (WES) offers unbiased analysis of the entire coding sequence, segmentation-based detection of copy number alterations (CNAs), and accurate determination of complex biomarkers including tumor mutational burden (TMB), homologous recombination repair deficiency (HRD), and microsatellite instability (MSI). To assess the inter-institution variability of clinical WES, we performed a comparative pilot study between German Centers of Personalized Medicine (ZPMs) from five participating institutions. Tumor and matched normal DNA from 30 patients were analyzed using custom sequencing protocols and bioinformatic pipelines. Calling of somatic variants was highly concordant with a positive percentage agreement (PPA) between 91 and 95% and a positive predictive value (PPV) between 82 and 95% compared with a three-institution consensus and full agreement for 16 of 17 druggable targets. Explanations for deviations included low VAF or coverage, differing annotations, and different filter protocols. CNAs showed overall agreement in 76% for the genomic sequence with high wet-lab variability. Complex biomarkers correlated strongly between institutions (HRD: 0.79-1, TMB: 0.97-0.99) and all institutions agreed on microsatellite instability. This study will contribute to the development of quality control frameworks for comprehensive genomic profiling and sheds light onto parameters that require stringent standardization.
Collapse
Affiliation(s)
- Michael Menzel
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Sebastian Kral
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Patrick Metzger
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Peter Horak
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ralf Marienfeld
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Melanie Boerries
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Comprehensive Cancer Center Freiburg (CCCF), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Steffen Wolter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Markus Ball
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Olaf Neumann
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Sorin Armeanu-Ebinger
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Christopher Schroeder
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Uta Matysiak
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Hannah Goldschmid
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Vincent Schipperges
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Axel Fürstberger
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Michael Allgäuer
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Timo Eberhardt
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Jakob Niewöhner
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
| | - Andreas Blaumeiser
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Carolin Ploeger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Tobias Bernd Haack
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Timothy Kwang Yong Tay
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Olga Kelemen
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Thomas Pauli
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Martina Kirchner
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Klaus Kluck
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Alexander Ott
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Marcus Renner
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Jakob Admard
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Axel Gschwind
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Silke Lassmann
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Hans Kestler
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Falko Fend
- Institute of Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
| | - Anna Lena Illert
- Department of Medicine I, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79085, Freiburg, Germany
- Medical Department for Hematology and Oncology, Klinikum Rechts der Isar, Technische Universität München, 80333, Munich, Germany
- German Cancer Consortium (DKTK) Partner Site Munich, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Martin Werner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Peter Möller
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
| | | | - Nisar Malek
- Center for Personalized Medicine (ZPM), Tübingen, Germany
- Department of Internal Medicine I, University Hospital Tübingen, Tübingen, Germany
| | - Peter Schirmacher
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Stefan Fröhling
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Daniel Kazdal
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Jan Budczies
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.
- Center for Personalized Medicine (ZPM), Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| | - Albrecht Stenzinger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.
- Center for Personalized Medicine (ZPM), Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| |
Collapse
|
12
|
Zhang B, Bassani-Sternberg M. Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
Affiliation(s)
- Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| |
Collapse
|
13
|
O’Sullivan B, Seoighe C. Comprehensive and realistic simulation of tumour genomic sequencing data. NAR Cancer 2023; 5:zcad051. [PMID: 37746635 PMCID: PMC10516706 DOI: 10.1093/narcan/zcad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 08/25/2023] [Accepted: 09/08/2023] [Indexed: 09/26/2023] Open
Abstract
Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.
Collapse
Affiliation(s)
- Brian O’Sullivan
- School of Mathematical and Statistical Sciences, University of Galway, University Road, Galway H91 TK33, Ireland
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, University Road, Galway H91 TK33, Ireland
| |
Collapse
|
14
|
Rollin J, Bester R, Brostaux Y, Caglayan K, De Jonghe K, Eichmeier A, Foucart Y, Haegeman A, Koloniuk I, Kominek P, Maree H, Onder S, Posada Céspedes S, Roumi V, Šafářová D, Schumpp O, Ulubas Serce C, Sõmera M, Tamisier L, Vainio E, van der Vlugt RAA, Massart S. Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies. PeerJ 2023; 11:e15816. [PMID: 37601254 PMCID: PMC10439718 DOI: 10.7717/peerj.15816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research in virology, especially for virus discovery. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS has greatly facilitated the sequencing of new genomes of detected viruses and their comparison. However, bioinformatics analyses allowing reconstruction of genome sequences and detection of single nucleotide polymorphisms (SNPs) can potentially create bias and has not been widely addressed so far. Therefore, more knowledge is required on the limitations of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale performance testing (PT) using three artificially designed datasets. To evaluate the impact of bioinformatics analyses, they were divided into three key steps: reads pre-processing, virus-isolate identification, and variant calling. Each step was evaluated independently through an original, PT design including discussion and validation between participants at each step. Overall, this work underlines key parameters influencing SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the detection were recognized as the most impactful analysis steps for the success of the SNPs detections. Strategies to improve the prediction of SNPs are also discussed.
Collapse
Affiliation(s)
- Johan Rollin
- Laboratory of Plant Pathology—TERRA—Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Rachelle Bester
- Citrus Research International, Matieland, South Africa
- Department of Genetics, Stellenbosch University, Matieland, South Africa
| | - Yves Brostaux
- Laboratory of Statistics, Computer Science and Modelling Applied to Bioengineering, TERRA, Gembloux Agro-Bio Tech, Teaching and Research Centre, University of Liège, Gembloux, Belgium
| | - Kadriye Caglayan
- Plant Protection Department, Agricultural Faculty, Hatay Mustafa Kemal University, Hatay, Turkey
| | - Kris De Jonghe
- Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
| | - Ales Eichmeier
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czech Republic
| | - Yoika Foucart
- Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
| | - Annelies Haegeman
- Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
| | - Igor Koloniuk
- Biology Centre CAS, Ceske Budejovice, Czech Republic
| | | | - Hans Maree
- Citrus Research International, Matieland, South Africa
- Department of Genetics, Stellenbosch University, Matieland, South Africa
| | - Serkan Onder
- Department of Plant Protection, Faculty of Agriculture, Eskişehir Osmangazi University, Eskişehir, Turkey
| | - Susana Posada Céspedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Vahid Roumi
- Plant Protection Department, Faculty of Agriculture, University of Maragheh, Maragheh, Iran
| | - Dana Šafářová
- Department of Cell Biology and Genetics, Faculty of Science, Palacký University Olomouc, Olomouc, Czech Republic
| | | | - Cigdem Ulubas Serce
- Plant Production and Technologies Department, Ayhan Şahenk Faculty of Agricultural Science and Technologies, Niğde Ömer Halisdemir University, Niğde, Turkey
| | - Merike Sõmera
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | - Lucie Tamisier
- Pathologie Végétale, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Montfavet, France
- GAFL, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Montfavet, France
| | - Eeva Vainio
- Natural Resources Institute Finland, Helsinki, Finland
| | | | - Sebastien Massart
- Laboratory of Plant Pathology—TERRA—Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| |
Collapse
|
15
|
Wilton R, Szalay AS. Short-read aligner performance in germline variant identification. Bioinformatics 2023; 39:btad480. [PMID: 37527006 PMCID: PMC10421969 DOI: 10.1093/bioinformatics/btad480] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/01/2023] [Accepted: 07/31/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools. RESULTS In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy. We examine the performance of three general-purpose short-read aligners-BWA-MEM, Bowtie 2, and Arioc-in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant-calling performance. AVAILABILITY AND IMPLEMENTATION The quick brown fox jumps over the lazy dog.
Collapse
Affiliation(s)
- Richard Wilton
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Alexander S Szalay
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218, United States
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
16
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
17
|
Evaluation of the Available Variant Calling Tools for Oxford Nanopore Sequencing in Breast Cancer. Genes (Basel) 2022; 13:genes13091583. [PMID: 36140751 PMCID: PMC9498802 DOI: 10.3390/genes13091583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/30/2022] [Accepted: 08/31/2022] [Indexed: 11/23/2022] Open
Abstract
The goal of biomarker testing, in the field of personalized medicine, is to guide treatments to achieve the best possible results for each patient. The accurate and reliable identification of everyone’s genome variants is essential for the success of clinical genomics, employing third-generation sequencing. Different variant calling techniques have been used and recommended by both Oxford Nanopore Technologies (ONT) and Nanopore communities. A thorough examination of the variant callers might give critical guidance for third-generation sequencing-based clinical genomics. In this study, two reference genome sample datasets (NA12878) and (NA24385) and the set of high-confidence variant calls provided by the Genome in a Bottle (GIAB) were used to allow the evaluation of the performance of six variant calling tools, including Human-SNP-wf, Clair3, Clair, NanoCaller, Longshot, and Medaka, as an integral step in the in-house variant detection workflow. Out of the six variant callers understudy, Clair3 and Human-SNP-wf that has Clair3 incorporated into it achieved the highest performance rates in comparison to the other variant callers. Evaluation of the results for the tool was expressed in terms of Precision, Recall, and F1-score using Hap.py tools for the comparison. In conclusion, our findings give important insights for identifying accurate variants from third-generation sequencing of personal genomes using different variant detection tools available for long-read sequencing.
Collapse
|
18
|
Abstract
Whole Exome Sequencing (WES) is used for querying DNA variants using the protein coding parts of genomes (exomes). However, WES analysis can be challenging because of the complexity of the data. Here, we describe a consolidated protocol for unbiased WES analysis. The protocol uses three variant callers (HaplotypeCaller, FreeBayes, and DeepVariant), which have different underlying models. We provide detailed execution steps, as well as basic variant filtering, annotation, visualization, and consolidation aspects. Protocol to enable whole exome data analysis in an unbiased approach A protocol for unbiased analysis using 3 variant callers with different underlying models From raw data to filtered, consolidated, and annotated DNA variant calls
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Collapse
|
19
|
Lei Y, Meng Y, Guo X, Ning K, Bian Y, Li L, Hu Z, Anashkina AA, Jiang Q, Dong Y, Zhu X. Overview of structural variation calling: Simulation, identification, and visualization. Comput Biol Med 2022; 145:105534. [DOI: 10.1016/j.compbiomed.2022.105534] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 12/11/2022]
|