1
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
2
|
Gudkov M, Thibaut L, Khushi M, Blue GM, Winlaw DS, Dunwoodie SL, Giannoulatou E. ConanVarvar: a versatile tool for the detection of large syndromic copy number variation from whole-genome sequencing data. BMC Bioinformatics 2023; 24:49. [PMID: 36792982 PMCID: PMC9930243 DOI: 10.1186/s12859-023-05154-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 01/19/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND A wide range of tools are available for the detection of copy number variants (CNVs) from whole-genome sequencing (WGS) data. However, none of them focus on clinically-relevant CNVs, such as those that are associated with known genetic syndromes. Such variants are often large in size, typically 1-5 Mb, but currently available CNV callers have been developed and benchmarked for the discovery of smaller variants. Thus, the ability of these programs to detect tens of real syndromic CNVs remains largely unknown. RESULTS Here we present ConanVarvar, a tool which implements a complete workflow for the targeted analysis of large germline CNVs from WGS data. ConanVarvar comes with an intuitive R Shiny graphical user interface and annotates identified variants with information about 56 associated syndromic conditions. We benchmarked ConanVarvar and four other programs on a dataset containing real and simulated syndromic CNVs larger than 1 Mb. In comparison to other tools, ConanVarvar reports 10-30 times less false-positive variants without compromising sensitivity and is quicker to run, especially on large batches of samples. CONCLUSIONS ConanVarvar is a useful instrument for primary analysis in disease sequencing studies, where large CNVs could be the cause of disease.
Collapse
Affiliation(s)
- Mikhail Gudkov
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1013.30000 0004 1936 834XSchool of Biomedical Engineering, The University of Sydney, Sydney, NSW 2006 Australia ,grid.1005.40000 0004 4902 0432St Vincent’s Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW 2010 Australia
| | - Loïc Thibaut
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432School of Mathematics and Statistics, UNSW Sydney, Sydney, NSW 2052 Australia
| | - Matloob Khushi
- grid.1013.30000 0004 1936 834XSchool of Computer Science, The University of Sydney, Sydney, NSW 2006 Australia
| | - Gillian M. Blue
- grid.1013.30000 0004 1936 834XSydney Medical School, The University of Sydney, Sydney, NSW 2006 Australia ,grid.413973.b0000 0000 9690 854XHeart Centre for Children, The Children’s Hospital at Westmead, Sydney, NSW 2145 Australia
| | - David S. Winlaw
- grid.1013.30000 0004 1936 834XSydney Medical School, The University of Sydney, Sydney, NSW 2006 Australia ,grid.413973.b0000 0000 9690 854XHeart Centre for Children, The Children’s Hospital at Westmead, Sydney, NSW 2145 Australia
| | - Sally L. Dunwoodie
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432St Vincent’s Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052 Australia
| | - Eleni Giannoulatou
- Victor Chang Cardiac Research Institute, Sydney, NSW, 2010, Australia. .,St Vincent's Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW, 2010, Australia.
| |
Collapse
|
3
|
Talsania K, Shen TW, Chen X, Jaeger E, Li Z, Chen Z, Chen W, Tran B, Kusko R, Wang L, Pang AWC, Yang Z, Choudhari S, Colgan M, Fang LT, Carroll A, Shetty J, Kriga Y, German O, Smirnova T, Liu T, Li J, Kellman B, Hong K, Hastie AR, Natarajan A, Moshrefi A, Granat A, Truong T, Bombardi R, Mankinen V, Meerzaman D, Mason CE, Collins J, Stahlberg E, Xiao C, Wang C, Xiao W, Zhao Y. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol 2022; 23:255. [PMID: 36514120 PMCID: PMC9746098 DOI: 10.1186/s13059-022-02816-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 11/17/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. RESULTS We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. CONCLUSIONS A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.
Collapse
Affiliation(s)
- Keyur Talsania
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Tsai-wei Shen
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Xiongfong Chen
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Erich Jaeger
- grid.185669.50000 0004 0507 3954Illumina Inc, Foster City, CA USA
| | - Zhipan Li
- grid.511732.3Sentieon Inc, Mountain View, CA USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, Loma Linda, CA USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, Loma Linda, CA USA
| | - Bao Tran
- grid.418021.e0000 0004 0535 8394Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | | | - Limin Wang
- grid.48336.3a0000 0004 1936 8075Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD USA
| | | | - Zhaowei Yang
- grid.470124.4Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong China
| | - Sulbha Choudhari
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Michael Colgan
- grid.483500.a0000 0001 2154 2448Center for Drug Evaluation and Research, FDA, Silver Spring, MD USA
| | - Li Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc, 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Andrew Carroll
- grid.511991.40000 0004 4910 5831DNAnexus, Mountain View, CA USA
| | - Jyoti Shetty
- grid.418021.e0000 0004 0535 8394Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Yuliya Kriga
- grid.418021.e0000 0004 0535 8394Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Oksana German
- grid.418021.e0000 0004 0535 8394Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Tatyana Smirnova
- grid.418021.e0000 0004 0535 8394Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Tiantain Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, Loma Linda, CA USA
| | - Jing Li
- grid.470124.4Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong China
| | - Ben Kellman
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA92121 USA
| | - Karl Hong
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA92121 USA
| | - Alex R. Hastie
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA92121 USA
| | - Aparna Natarajan
- grid.185669.50000 0004 0507 3954Illumina Inc, Foster City, CA USA
| | - Ali Moshrefi
- grid.185669.50000 0004 0507 3954Illumina Inc, Foster City, CA USA
| | | | - Tiffany Truong
- grid.185669.50000 0004 0507 3954Illumina Inc, Foster City, CA USA
| | - Robin Bombardi
- grid.185669.50000 0004 0507 3954Illumina Inc, Foster City, CA USA
| | | | - Daoud Meerzaman
- grid.48336.3a0000 0004 1936 8075Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD USA
| | - Christopher E. Mason
- grid.5386.8000000041936877XDepartment of Physiology and Biophysics, Weill Cornell Medicine, New York, NY USA
| | - Jack Collins
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Eric Stahlberg
- grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Chunlin Xiao
- grid.419234.90000 0004 0604 5429National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, Loma Linda, CA USA
| | - Wenming Xiao
- grid.483500.a0000 0001 2154 2448Center for Drug Evaluation and Research, FDA, Silver Spring, MD USA
| | - Yongmei Zhao
- grid.418021.e0000 0004 0535 8394Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD USA ,grid.418021.e0000 0004 0535 8394Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| |
Collapse
|
4
|
Xiao C, Chen Z, Chen W, Padilla C, Colgan M, Wu W, Fang LT, Liu T, Yang Y, Schneider V, Wang C, Xiao W. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol 2022; 23:237. [PMID: 36352452 PMCID: PMC9648002 DOI: 10.1186/s13059-022-02803-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The use of a personalized haplotype-specific genome assembly, rather than an unrelated, mosaic genome like GRCh38, as a reference for detecting the full spectrum of somatic events from cancers has long been advocated but has never been explored in tumor-normal paired samples. Here, we provide the first demonstrated use of de novo assembled personalized genome as a reference for cancer mutation detection and quantifying the effects of the reference genomes on the accuracy of somatic mutation detection. RESULTS We generate de novo assemblies of the first tumor-normal paired genomes, both nuclear and mitochondrial, derived from the same individual with triple negative breast cancer. The personalized genome was chromosomal scale, haplotype phased, and annotated. We demonstrate that it provides individual specific haplotypes for complex regions and medically relevant genes. We illustrate that the personalized genome reference not only improves read alignments for both short-read and long-read sequencing data but also ameliorates the detection accuracy of somatic SNVs and SVs. We identify the equivalent somatic mutation calls between two genome references and uncover novel somatic mutations only when personalized genome assembly is used as a reference. CONCLUSIONS Our findings demonstrate that use of a personalized genome with individual-specific haplotypes is essential for accurate detection of the full spectrum of somatic mutations in the paired tumor-normal samples. The unique resource and methodology established in this study will be beneficial to the development of precision oncology medicine not only for breast cancer, but also for other cancers.
Collapse
Affiliation(s)
- Chunlin Xiao
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Cory Padilla
- grid.504403.6Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066 USA
| | - Michael Colgan
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| | - Wenjun Wu
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Li-Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Tiantian Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Yibin Yang
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Valerie Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wenming Xiao
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| |
Collapse
|
5
|
Espejo Valle-Inclan J, Besselink NJ, de Bruijn E, Cameron DL, Ebler J, Kutzera J, van Lieshout S, Marschall T, Nelen M, Priestley P, Renkens I, Roemer MG, van Roosmalen MJ, Wenger AM, Ylstra B, Fijneman RJ, Kloosterman WP, Cuppen E. A multi-platform reference for somatic structural variation detection. Cell Genom 2022; 2:100139. [PMID: 36778136 PMCID: PMC9903816 DOI: 10.1016/j.xgen.2022.100139] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 05/06/2021] [Accepted: 05/06/2022] [Indexed: 10/18/2022]
Abstract
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality, gold-standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines. Here, we performed somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different sequencing technologies. Based on the evidence from multiple technologies combined with extensive experimental validation, we compiled a comprehensive set of carefully curated and validated somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects. The truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
Collapse
Affiliation(s)
| | - Nicolle J.M. Besselink
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Daniel L. Cameron
- Hartwig Medical Foundation, Amsterdam, the Netherlands,Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Joachim Kutzera
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Nelen
- Department of Human Genetics, Radboud UMC, Nijmegen, the Netherlands
| | | | - Ivo Renkens
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | - Margaretha G.M. Roemer
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | | | | | - Bauke Ylstra
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | - Remond J.A. Fijneman
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Wigard P. Kloosterman
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Corresponding author
| | - Edwin Cuppen
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Hartwig Medical Foundation, Amsterdam, the Netherlands,Corresponding author
| |
Collapse
|
6
|
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, Johanson E, Boja E, Maier EJ, Serang O, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Kyriakidis K, Malousi A, Shafin K, Pesout T, Jain M, Paten B, Chang PC, Kolesnikov A, Nattestad M, Baid G, Goel S, Yang H, Carroll A, Eveleigh R, Bourgey M, Bourque G, Li G, Ma C, Tang L, Du Y, Zhang S, Morata J, Tonda R, Parra G, Trotta JR, Brueffer C, Demirkaya-Budak S, Kabakci-Zorlu D, Turgut D, Kalay Ö, Budak G, Narcı K, Arslan E, Brown R, Johnson IJ, Dolgoborodov A, Semenyuk V, Jain A, Tetikol HS, Jain V, Ruehle M, Lajoie B, Roddey C, Catreux S, Mehio R, Ahsan MU, Liu Q, Wang K, Ebrahim Sahraeian SM, Fang LT, Mohiyuddin M, Hung C, Jain C, Feng H, Li Z, Chen L, Sedlazeck FJ, Zook JM. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genom 2022; 2:S2666-979X(22)00058-1. [PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 11/01/2021] [Accepted: 04/08/2022] [Indexed: 11/19/2022]
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Collapse
Affiliation(s)
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | | | | | | | - Elaine Johanson
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Emily Boja
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Ezekiel J. Maier
- Booz Allen Hamilton, 8283 Greensboro Drive, Mclean, VA 22102, USA
| | - Omar Serang
- DNAnexus, Inc., 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
| | - Andigoni Malousi
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | | | - Maria Nattestad
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Gunjan Baid
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Sidharth Goel
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Howard Yang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Robert Eveleigh
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Mathieu Bourgey
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Guillaume Bourque
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Gen Li
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ChouXian Ma
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - LinQi Tang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - YuanPing Du
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ShaoWei Zhang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - Jordi Morata
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raúl Tonda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Genís Parra
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jean-Rémi Trotta
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christian Brueffer
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | | | | | - Deniz Turgut
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Özem Kalay
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Gungor Budak
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Kübra Narcı
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | | | | | | | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA 95050, USA
| | | | | | - Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| |
Collapse
|
7
|
Cortés-Ciriano I, Gulhan DC, Lee JJ, Melloni GEM, Park PJ. Computational analysis of cancer genome sequencing data. Nat Rev Genet 2022; 23:298-314. [PMID: 34880424 DOI: 10.1038/s41576-021-00431-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2021] [Indexed: 02/07/2023]
Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Collapse
|
8
|
Yang L. Meerkat: An Algorithm to Reliably Identify Structural Variations and Predict Their Forming Mechanisms. Methods Mol Biol 2022; 2493:107-135. [PMID: 35751812 PMCID: PMC11079867 DOI: 10.1007/978-1-0716-2293-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Next-generation sequencing technologies have been widely used to query genetic variants in normal individuals as well as in those with diseases. Large-scale structural variations are a common source of genetic diversity in human population, and some of them have significant contributions to the etiology of diseases. However, the detection of large-scale structural variations from sequencing data remains challenging. Here, we describe Meerkat-an algorithm which can reliably detect structural variations from Illumina short-read sequencing data at basepair resolution. A unique feature of Meerkat is that it can infer the variant forming mechanisms based on the DNA content and features at the breakpoints.
Collapse
Affiliation(s)
- Lixing Yang
- Ben May Department for Cancer Research, Department of Human Genetics, Comprehensive Cancer Center, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
9
|
Tanner G, Westhead DR, Droop A, Stead LF. Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat Commun 2021; 12:6396. [PMID: 34737285 DOI: 10.1038/s41467-021-26698-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/20/2021] [Indexed: 11/09/2022] Open
Abstract
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.
Collapse
|
10
|
Fang LT, Zhu B, Zhao Y, Chen W, Yang Z, Kerrigan L, Langenbach K, de Mars M, Lu C, Idler K, Jacob H, Zheng Y, Ren L, Yu Y, Jaeger E, Schroth GP, Abaan OD, Talsania K, Lack J, Shen TW, Chen Z, Stanbouly S, Tran B, Shetty J, Kriga Y, Meerzaman D, Nguyen C, Petitjean V, Sultan M, Cam M, Mehta M, Hung T, Peters E, Kalamegham R, Sahraeian SME, Mohiyuddin M, Guo Y, Yao L, Song L, Lam HYK, Drabek J, Vojta P, Maestro R, Gasparotto D, Kõks S, Reimann E, Scherer A, Nordlund J, Liljedahl U, Jensen RV, Pirooznia M, Li Z, Xiao C, Sherry ST, Kusko R, Moos M, Donaldson E, Tezak Z, Ning B, Tong W, Li J, Duerken-Hughes P, Catalanotti C, Maheshwari S, Shuga J, Liang WS, Keats J, Adkins J, Tassone E, Zismann V, McDaniel T, Trent J, Foox J, Butler D, Mason CE, Hong H, Shi L, Wang C, Xiao W. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 2021; 39:1151-1160. [PMID: 34504347 PMCID: PMC8532138 DOI: 10.1038/s41587-021-00993-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 06/18/2021] [Indexed: 02/08/2023]
Abstract
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
Collapse
Affiliation(s)
- Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yongmei Zhao
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Zhaowei Yang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Liz Kerrigan
- ATCC (American Type Culture Collection), Manassas, VA, USA
| | | | | | - Charles Lu
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Kenneth Idler
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Howard Jacob
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Keyur Talsania
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Justin Lack
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tsai-Wei Shen
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Seta Stanbouly
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Virginie Petitjean
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Marc Sultan
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Margaret Cam
- CCR Collaborative Bioinformatics Resource (CCBR), Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA
| | - Monika Mehta
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tiffany Hung
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | - Eric Peters
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | - Rasika Kalamegham
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | | | - Marghoob Mohiyuddin
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Yunfei Guo
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Lijing Yao
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugo Y K Lam
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Jiri Drabek
- IMTM, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Petr Vojta
- IMTM, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Roberta Maestro
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Daniela Gasparotto
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Sulev Kõks
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Perron Institute for Neurological and Translational Science, Nedlands, Western Australia, Australia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Ene Reimann
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Andreas Scherer
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jessica Nordlund
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ulrika Liljedahl
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Roderick V Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhipan Li
- Sentieon Inc., Mountain View, CA, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Malcolm Moos
- Center for Biologics Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Eric Donaldson
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Zivana Tezak
- Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA
| | - Baitang Ning
- National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Weida Tong
- National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Jing Li
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | | | | | | | | | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Jonathan Keats
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | - Erica Tassone
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | | | - Jeffrey Trent
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Daniel Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Huixiao Hong
- National Center for Toxicological Research, FDA, Jefferson, AR, USA.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
- Department of Basic Science, Loma Linda University School of Medicine, Loma Linda, CA, USA.
| | - Wenming Xiao
- Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA.
| |
Collapse
|
11
|
Creason A, Haan D, Dang K, Chiotti KE, Inkman M, Lamb A, Yu T, Hu Y, Norman TC, Buchanan A, van Baren MJ, Spangler R, Rollins MR, Spellman PT, Rozanov D, Zhang J, Maher CA, Caloian C, Watson JD, Uhrig S, Haas BJ, Jain M, Akeson M, Ahsen ME, Stolovitzky G, Guinney J, Boutros PC, Stuart JM, Ellrott K. A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Syst 2021; 12:827-838.e5. [PMID: 34146471 PMCID: PMC8376800 DOI: 10.1016/j.cels.2021.05.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 09/15/2020] [Accepted: 05/25/2021] [Indexed: 02/03/2023]
Abstract
The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Allison Creason
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - David Haan
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Kami E. Chiotti
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Matthew Inkman
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | | | | | - Yin Hu
- Sage Bionetworks, Seattle, WA, USA
| | | | - Alex Buchanan
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Marijke J. van Baren
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Spangler
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - M. Rick Rollins
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Paul T. Spellman
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Dmitri Rozanov
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Jin Zhang
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | - Christopher A. Maher
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | - Cristian Caloian
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada
| | - John D. Watson
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada
| | - Sebastian Uhrig
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ) and Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Brian J. Haas
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Miten Jain
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Akeson
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mehmet Eren Ahsen
- Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, One Gustave Levy Place, New York, NY 1498, USA
| | | | - Gustavo Stolovitzky
- Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, One Gustave Levy Place, New York, NY 1498, USA,IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA
| | | | - Paul C. Boutros
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada,Departments of Medical Biophysics and Pharmacology & Toxicology, University of Toronto, Toronto, Canada,Departments of Human Genetics and Urology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Joshua M. Stuart
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kyle Ellrott
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA,Lead contact,Correspondence:
| |
Collapse
|
12
|
Min YK, Park KS. The Application of Control Materials for Ongoing Quality Management of Next-Generation Sequencing in a Clinical Genetic Laboratory. ACTA ACUST UNITED AC 2021; 57:medicina57060543. [PMID: 34071304 PMCID: PMC8227145 DOI: 10.3390/medicina57060543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 05/25/2021] [Accepted: 05/25/2021] [Indexed: 11/16/2022]
Abstract
Next-generation sequencing (NGS) has played an important role in detecting genetic variants with pathologic and therapeutic potential. The advantages of NGS, such as high-throughput sequencing capacity and massively parallel sequencing, have a significant impact on realization of genetic profiling in clinical genetic laboratories. These changes have enabled clinicians to execute precision medicine in diagnosis, prognosis, and treatment for patients. However, to adapt targeted gene panels in diagnostic use, analytical validation and ongoing quality control should be implemented and applied with both practical guidelines and appropriate control materials. Several guidelines for NGS quality control recommend usage of control materials such as HapMap cell lines, synthetic DNA fragments, and genetically characterized cell lines; however, specifications or applications of such usage are insufficient to guideline method development. This review focuses on what factors should be considered before control material selection for NGS assay and practical methods of how they could be developed in clinical genetic laboratories. This review also provides the detailed sources of critical information related to control materials.
Collapse
Affiliation(s)
- Young-Kyu Min
- Department of Medical Laser, Dankook University, Chungnam 31116, Korea;
- Department of Laboratory Medicine, Severance Hospital, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea
| | - Kyung-Sun Park
- Department of Laboratory Medicine, Kyung Hee University School of Medicine and Kyung Hee University Medical Center, Seoul 02447, Korea
- Correspondence: ; Tel.: +82-2-958-8674
| |
Collapse
|
13
|
Li Z, Fang S, Zhang R, Yu L, Zhang J, Bu D, Sun L, Zhao Y, Li J. VarBen. J Mol Diagn 2021; 23:285-299. [DOI: 10.1016/j.jmoldx.2020.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 10/06/2020] [Accepted: 11/17/2020] [Indexed: 02/08/2023] Open
|
14
|
Abstract
Next-generation sequencing technologies have enabled a dramatic expansion of clinical genetic testing both for inherited conditions and diseases such as cancer. Accurate variant calling in NGS data is a critical step upon which virtually all downstream analysis and interpretation processes rely. Just as NGS technologies have evolved considerably over the past 10 years, so too have the software tools and approaches for detecting sequence variants in clinical samples. In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio sequencing for inherited disorders and somatic mutation detection in cancer patients. I describe the relative strengths and weaknesses of panel, exome, and whole-genome sequencing for variant detection. Recommended tools and strategies for calling variants of different classes are also provided, along with guidance on variant review, validation, and benchmarking to ensure optimal performance. Although NGS technologies are continually evolving, and new capabilities (such as long-read single-molecule sequencing) are emerging, the “best practice” principles in this review should be relevant to clinical variant calling in the long term.
Collapse
Affiliation(s)
- Daniel C Koboldt
- Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children's Hospital, Columbus, OH, USA. .,Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
15
|
Khalighi S, Singh S, Varadan V. Untangling a complex web: Computational analyses of tumor molecular profiles to decode driver mechanisms. J Genet Genomics 2020; 47:595-609. [PMID: 33423960 PMCID: PMC7902422 DOI: 10.1016/j.jgg.2020.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 11/04/2020] [Accepted: 11/14/2020] [Indexed: 12/19/2022]
Abstract
Genome-scale studies focusing on molecular profiling of cancers across tissue types have revealed a plethora of aberrations across the genomic, transcriptomic, and epigenomic scales. The significant molecular heterogeneity across individual tumors even within the same tissue context complicates decoding the key etiologic mechanisms of this disease. Furthermore, it is increasingly likely that biologic mechanisms underlying the pathobiology of cancer involve multiple molecular entities interacting across functional scales. This has motivated the development of computational approaches that integrate molecular measurements with prior biological knowledge in increasingly intricate ways to enable the discovery of driver genomic aberrations across cancers. Here, we review diverse methodological approaches that have powered significant advances in our understanding of the genomic underpinnings of cancer at the cohort and at the individual tumor scales. We outline the key advances and challenges in the computational discovery of cancer mechanisms while motivating the development of systems biology approaches to comprehensively decode the biologic drivers of this complex disease.
Collapse
Affiliation(s)
- Sirvan Khalighi
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Salendra Singh
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Vinay Varadan
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
16
|
Riggs K, Chen HS, Rotunno M, Li B, Simonds NI, Mechanic LE, Peng B. On the application, reporting, and sharing of in silico simulations for genetic studies. Genet Epidemiol 2020; 45:131-141. [PMID: 33063887 PMCID: PMC7984380 DOI: 10.1002/gepi.22362] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/31/2022]
Abstract
In silico simulations play an indispensable role in the development and application of statistical models and methods for genetic studies. Simulation tools allow for the evaluation of methods and investigation of models in a controlled manner. With the growing popularity of evolutionary models and simulation‐based statistical methods, genetic simulations have been applied to a wide variety of research disciplines such as population genetics, evolutionary genetics, genetic epidemiology, ecology, and conservation biology. In this review, we surveyed 1409 articles from five journals that publish on major application areas of genetic simulations. We identified 432 papers in which genetic simulations were used and examined the targets and applications of simulation studies and how these simulation methods and simulated data sets are reported and shared. Whereas a large proportion (30%) of the surveyed articles reported the use of genetic simulations, only 28% of these genetic simulation studies used existing simulation software, 2% used existing simulated data sets, and 19% and 12% made source code and simulated data sets publicly available, respectively. Moreover, 15% of articles provided no information on how simulation studies were performed. These findings suggest a need to encourage sharing and reuse of existing simulation software and data sets, as well as providing more information regarding the performance of simulations.
Collapse
Affiliation(s)
- Kaleigh Riggs
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Huann-Sheng Chen
- Division of Cancer Control and Population Sciences, Statistical Research and Applications Branch, Surveillance Research Program, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - Melissa Rotunno
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bing Li
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| | | | - Leah E Mechanic
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bo Peng
- Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
17
|
SoRelle JA, Wachsmann M, Cantarel BL. Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays. Arch Pathol Lab Med 2020; 144:1118-1130. [PMID: 32045276 DOI: 10.5858/arpa.2019-0476-ra] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/09/2019] [Indexed: 11/06/2022]
Abstract
CONTEXT.— Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories. OBJECTIVE.— To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay. DATA SOURCES.— This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation. CONCLUSIONS.— This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.
Collapse
Affiliation(s)
- Jeffrey A SoRelle
- Department of Pathology (SoRelle, Wachsmann), University of Texas Southwestern Medical Center, Dallas
| | - Megan Wachsmann
- Department of Pathology (SoRelle, Wachsmann), University of Texas Southwestern Medical Center, Dallas
| | - Brandi L Cantarel
- Bioinformatics Core Facility (Cantarel), University of Texas Southwestern Medical Center, Dallas.,Department of Bioinformatics (Cantarel), University of Texas Southwestern Medical Center, Dallas.,University of Texas Southwestern Medical Center, Dallas
| |
Collapse
|
18
|
Xiao Y, Wang X, Zhang H, Ulintz PJ, Li H, Guan Y. FastClone is a probabilistic tool for deconvoluting tumor heterogeneity in bulk-sequencing samples. Nat Commun 2020; 11:4469. [PMID: 32901013 PMCID: PMC7478963 DOI: 10.1038/s41467-020-18169-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.
Collapse
Affiliation(s)
- Yao Xiao
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.,Microsoft Inc., Redmond, WA, USA
| | - Peter J Ulintz
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA. .,Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
19
|
Abstract
Profiling genetic variants-including single nucleotide variants, small insertions and deletions, copy number variations, and structural variations (SVs)-from both healthy individuals and individuals with disease is a key component of genetic and biomedical research. SVs are large-scale changes in the genome and involve breakage and rejoining of DNA fragments. They may affect thousands to millions of nucleotides and can lead to loss, gain, and reshuffling of genes and regulatory elements. SVs are known to impact gene expression and potentially result in altered phenotypes and diseases. Therefore, identifying SVs from the human genomes is particularly important. In this review, I describe advantages and disadvantages of the available high-throughput assays for the discovery of SVs, which are the most challenging genetic alterations to detect. A practical guide is offered to suggest the most suitable strategies for discovering different types of SVs including common germline, rare, somatic, and complex variants. I also discuss factors to be considered, such as cost and performance, for different strategies when designing experiments. Last, I present several approaches to identify potential SV artifacts caused by samples, experimental procedures, and computational analysis. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Lixing Yang
- Ben May Department for Cancer Research, Department of Human Genetics, University of Chicago, Chicago, Illinois
| |
Collapse
|
20
|
Gong T, Hayes VM, Chan EKF. Shiny-SoSV: A web-based performance calculator for somatic structural variant detection. PLoS One 2020; 15:e0238108. [PMID: 32853264 DOI: 10.1371/journal.pone.0238108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/10/2020] [Indexed: 11/19/2022] Open
Abstract
Somatic structural variants are an important contributor to cancer development and evolution. Accurate detection of these complex variants from whole genome sequencing data is influenced by a multitude of parameters. However, there are currently no tools for guiding study design nor are there applications that could predict the performance of somatic structural variant detection. To address this gap, we developed Shiny-SoSV, a user-friendly web-based calculator for determining the impact of common variables on the sensitivity, precision and F1 score of somatic structural variant detection, including choice of variant detection tool, sequencing depth of coverage, variant allele fraction, and variant breakpoint resolution. Using simulation studies, we determined singular and combinatoric effects of these variables, modelled the results using a generalised additive model, allowing structural variant detection performance to be predicted for any combination of predictors. Shiny-SoSV provides an interactive and visual platform for users to easily compare individual and combined impact of different parameters. It predicts the performance of a proposed study design, on somatic structural variant detection, prior to the commencement of benchwork. Shiny-SoSV is freely available at https://hcpcg.shinyapps.io/Shiny-SoSV with accompanying user’s guide and example use-cases.
Collapse
|
21
|
Zook JM, Hansen NF, Olson ND, Chapman L, Mullikin JC, Xiao C, Sherry S, Koren S, Phillippy AM, Boutros PC, Sahraeian SME, Huang V, Rouette A, Alexander N, Mason CE, Hajirasouliha I, Ricketts C, Lee J, Tearle R, Fiddes IT, Barrio AM, Wala J, Carroll A, Ghaffari N, Rodriguez OL, Bashir A, Jackman S, Farrell JJ, Wenger AM, Alkan C, Soylev A, Schatz MC, Garg S, Church G, Marschall T, Chen K, Fan X, English AC, Rosenfeld JA, Zhou W, Mills RE, Sage JM, Davis JR, Kaiser MD, Oliver JS, Catalano AP, Chaisson MJP, Spies N, Sedlazeck FJ, Salit M. A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 2020; 38:1347-55. [PMID: 32541955 DOI: 10.1038/s41587-020-0538-8] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 04/28/2020] [Indexed: 12/19/2022]
Abstract
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed the first sequence-resolved benchmark set for identification of both false negative and false positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12745 isolated, sequence-resolved insertion (7281) and deletion (5464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5262 insertions and 4095 deletions supported by ≥1 diploid assembly. We demonstrate the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
Collapse
|
22
|
Gong T, Hayes VM, Chan EKF. Detection of somatic structural variants from short-read next-generation sequencing data. Brief Bioinform 2020; 22:5831479. [PMID: 32379294 PMCID: PMC8138798 DOI: 10.1093/bib/bbaa056] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 03/05/2020] [Accepted: 03/29/2020] [Indexed: 01/09/2023] Open
Abstract
Somatic structural variants (SVs), which are variants that typically impact >50 nucleotides, play a significant role in cancer development and evolution but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of seven commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the seven SV callers examined in this paper. As the importance of large SVs become increasingly recognized in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection that should be considered when choosing SV callers.
Collapse
Affiliation(s)
| | - Vanessa M Hayes
- Corresponding authors: Eva K.F. Chan, New South Wales Health Pathology, Newcastle, NSW 2300, Australia. E-mail: ; Vanessa M. Hayes, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia. Tel.: +61-2-9355-5841; Fax: +61 2-2-9295-8151; E-mail:
| | - Eva K F Chan
- Corresponding authors: Eva K.F. Chan, New South Wales Health Pathology, Newcastle, NSW 2300, Australia. E-mail: ; Vanessa M. Hayes, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia. Tel.: +61-2-9355-5841; Fax: +61 2-2-9295-8151; E-mail:
| |
Collapse
|
23
|
Jammula S, Katz-Summercorn AC, Li X, Linossi C, Smyth E, Killcoyne S, Biasci D, Subash VV, Abbas S, Blasko A, Devonshire G, Grantham A, Wronowski F, O'Donovan M, Grehan N, Eldridge MD, Tavaré S, Fitzgerald RC. Identification of Subtypes of Barrett's Esophagus and Esophageal Adenocarcinoma Based on DNA Methylation Profiles and Integration of Transcriptome and Genome Data. Gastroenterology 2020; 158:1682-1697.e1. [PMID: 32032585 PMCID: PMC7305027 DOI: 10.1053/j.gastro.2020.01.044] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 01/27/2020] [Accepted: 01/29/2020] [Indexed: 02/08/2023]
Abstract
BACKGROUND & AIMS Esophageal adenocarcinomas (EACs) are heterogeneous and often preceded by Barrett's esophagus (BE). Many genomic changes have been associated with development of BE and EAC, but little is known about epigenetic alterations. We performed epigenetic analyses of BE and EAC tissues and combined these data with transcriptome and genomic data to identify mechanisms that control gene expression and genome integrity. METHODS In a retrospective cohort study, we collected tissue samples and clinical data from 150 BE and 285 EAC cases from the Oesophageal Cancer Classification and Molecular Stratification consortium in the United Kingdom. We analyzed methylation profiles of all BE and EAC tissues and assigned them to subgroups using non-negative matrix factorization with k-means clustering. Data from whole-genome sequencing and transcriptome studies were then incorporated; we performed integrative methylation and RNA-sequencing analyses to identify genes that were suppressed with increased methylation in promoter regions. Levels of different immune cell types were computed using single-sample gene set enrichment methods. We derived 8 organoids from 8 EAC tissues and tested their sensitivity to different drugs. RESULTS BE and EAC samples shared genome-wide methylation features, compared with normal tissues (esophageal, gastric, and duodenum; controls) from the same patients and grouped into 4 subtypes. Subtype 1 was characterized by DNA hypermethylation with a high mutation burden and multiple mutations in genes in cell cycle and receptor tyrosine signaling pathways. Subtype 2 was characterized by a gene expression pattern associated with metabolic processes (ATP synthesis and fatty acid oxidation) and lack methylation at specific binding sites for transcription factors; 83% of samples of this subtype were BE and 17% were EAC. The third subtype did not have changes in methylation pattern, compared with control tissue, but had a gene expression pattern that indicated immune cell infiltration; this tumor type was associated with the shortest time of patient survival. The fourth subtype was characterized by DNA hypomethylation associated with structure rearrangements, copy number alterations, with preferential amplification of CCNE1 (cells with this gene amplification have been reported to be sensitive to CDK2 inhibitors). Organoids with reduced levels of MGMT and CHFR expression were sensitive to temozolomide and taxane drugs. CONCLUSIONS In a comprehensive integrated analysis of methylation, transcriptome, and genome profiles of more than 400 BE and EAC tissues, along with clinical data, we identified 4 subtypes that were associated with patient outcomes and potential responses to therapy.
Collapse
Affiliation(s)
- SriGanesh Jammula
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
| | | | - Xiaodun Li
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Constanza Linossi
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth Smyth
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Sarah Killcoyne
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Daniele Biasci
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
| | - Vinod V Subash
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Sujath Abbas
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Adrienn Blasko
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Ginny Devonshire
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
| | - Amber Grantham
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Filip Wronowski
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Maria O'Donovan
- Department of Histopathology, Cambridge University Hospital NHS Trust, Cambridge, United Kingdom
| | - Nicola Grehan
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom
| | - Matthew D Eldridge
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
| | - Simon Tavaré
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, United Kingdom; Irving Institute for Cancer Dynamics, Columbia University, New York, New York
| | - Rebecca C Fitzgerald
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
24
|
Sorn P, Holtsträter C, Löwer M, Sahin U, Weber D. ArtiFuse-computational validation of fusion gene detection tools without relying on simulated reads. Bioinformatics 2020; 36:373-379. [PMID: 31373612 DOI: 10.1093/bioinformatics/btz613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 07/30/2019] [Accepted: 08/01/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. RESULTS Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings. AVAILABILITY AND IMPLEMENTATION ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Patrick Sorn
- TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany
| | - Christoph Holtsträter
- TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany
| | - Martin Löwer
- TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany
| | - Ugur Sahin
- TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany
| | - David Weber
- TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany
| |
Collapse
|
25
|
Salcedo A, Tarabichi M, Espiritu SMG, Deshwar AG, David M, Wilson NM, Dentro S, Wintersinger JA, Liu LY, Ko M, Sivanandan S, Zhang H, Zhu K, Ou Yang TH, Chilton JM, Buchanan A, Lalansingh CM, P'ng C, Anghel CV, Umar I, Lo B, Zou W, Simpson JT, Stuart JM, Anastassiou D, Guan Y, Ewing AD, Ellrott K, Wedge DC, Morris Q, Van Loo P, Boutros PC. A community effort to create standards for evaluating tumor subclonal reconstruction. Nat Biotechnol 2020; 38:97-107. [PMID: 31919445 PMCID: PMC6956735 DOI: 10.1038/s41587-019-0364-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 11/18/2019] [Indexed: 02/03/2023]
Abstract
Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.
Collapse
Affiliation(s)
- Adriana Salcedo
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Maxime Tarabichi
- The Francis Crick Institute, London, UK
- Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Amit G Deshwar
- The Edward S. Rogers Senior Department of Electrical & Computer Engineering, Toronto, Canada
| | - Matei David
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Stefan Dentro
- The Francis Crick Institute, London, UK
- Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Lydia Y Liu
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Minjeong Ko
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Kaiyi Zhu
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Tai-Hsien Ou Yang
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - John M Chilton
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Alex Buchanan
- Oregon Health & Sciences University, Portland, OR, USA
| | | | | | | | - Imaad Umar
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bryan Lo
- Ontario Institute for Cancer Research, Toronto, Canada
| | - William Zou
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Joshua M Stuart
- Department of Biomolecular Engineering, Center for Biomolecular Sciences and Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Electronic Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Adam D Ewing
- Mater Research Institute, University of Queensland, Woolloongabba, Queensland, Australia
| | - Kyle Ellrott
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
- Oregon Health & Sciences University, Portland, OR, USA
| | - David C Wedge
- Big Data Institute, University of Oxford, Oxford, UK
- Oxford NIHR Biomedical Research Centre, Oxford, UK
| | - Quaid Morris
- Ontario Institute for Cancer Research, Toronto, Canada
- Donnelly Centre, University of Toronto, Toronto, Canada
- Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Peter Van Loo
- The Francis Crick Institute, London, UK
- Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Paul C Boutros
- Department of Medical Biophysics, University of Toronto, Toronto, Canada.
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Urology, University of California, Los Angeles, Los Angeles, CA, USA.
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
26
|
Wijfjes RY, Smit S, de Ridder D. Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data. BMC Genomics 2019; 20:818. [PMID: 31699036 PMCID: PMC6836508 DOI: 10.1186/s12864-019-6153-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 09/30/2019] [Indexed: 01/27/2023] Open
Abstract
Background Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. Results To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. Conclusions Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Raúl Y Wijfjes
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands.
| | - Sandra Smit
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
27
|
Zhang Y, Yang L, Kucherlapati M, Hadjipanayis A, Pantazi A, Bristow CA, Lee EA, Mahadeshwar HS, Tang J, Zhang J, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette LJ, Xi R, Chin L, Protopopov A, Park PJ, Kucherlapati R, Creighton CJ. Global impact of somatic structural variation on the DNA methylome of human cancers. Genome Biol 2019; 20:209. [PMID: 31610796 PMCID: PMC6792267 DOI: 10.1186/s13059-019-1818-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Accepted: 09/09/2019] [Indexed: 12/21/2022] Open
Abstract
Background Genomic rearrangements exert a heavy influence on the molecular landscape of cancer. New analytical approaches integrating somatic structural variants (SSVs) with altered gene features represent a framework by which we can assign global significance to a core set of genes, analogous to established methods that identify genes non-randomly targeted by somatic mutation or copy number alteration. While recent studies have defined broad patterns of association involving gene transcription and nearby SSV breakpoints, global alterations in DNA methylation in the context of SSVs remain largely unexplored. Results By data integration of whole genome sequencing, RNA sequencing, and DNA methylation arrays from more than 1400 human cancers, we identify hundreds of genes and associated CpG islands (CGIs) for which the nearby presence of a somatic structural variant (SSV) breakpoint is recurrently associated with altered expression or DNA methylation, respectively, independently of copy number alterations. CGIs with SSV-associated increased methylation are predominantly promoter-associated, while CGIs with SSV-associated decreased methylation are enriched for gene body CGIs. Rearrangement of genomic regions normally having higher or lower methylation is often involved in SSV-associated CGI methylation alterations. Across cancers, the overall structural variation burden is associated with a global decrease in methylation, increased expression in methyltransferase genes and DNA damage response genes, and decreased immune cell infiltration. Conclusion Genomic rearrangement appears to have a major role in shaping the cancer DNA methylome, to be considered alongside commonly accepted mechanisms including histone modifications and disruption of DNA methyltransferases.
Collapse
Affiliation(s)
- Yiqun Zhang
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Lixing Yang
- Ben May Department for Cancer Research and Department of Human Genetics, The University of Chicago, Chicago, IL, 60637, USA
| | - Melanie Kucherlapati
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Division of Genetics, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Angela Hadjipanayis
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Division of Genetics, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Angeliki Pantazi
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB, T1K 3M4, Canada
| | - Christopher A Bristow
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Harshad S Mahadeshwar
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jiabin Tang
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jianhua Zhang
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Sahil Seth
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Semin Lee
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Xiaojia Ren
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Xingzhi Song
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Huandong Sun
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jonathan Seidman
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Lovelace J Luquette
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Ruibin Xi
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Lynda Chin
- Department of Genomic Medicine, Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,The Eli and Edythe L. Broad Institute of Massachusetts Institute Of Technology and Harvard University, Cambridge, MA, 02142, USA
| | | | - Peter J Park
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, 02115, USA.,Center for Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Raju Kucherlapati
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Division of Genetics, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Chad J Creighton
- Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. .,Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA. .,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
28
|
Whitford W, Lehnert K, Snell RG, Jacobsen JC. Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data. J Biomed Inform 2019; 94:103174. [PMID: 30965134 DOI: 10.1016/j.jbi.2019.103174] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/12/2019] [Accepted: 04/06/2019] [Indexed: 12/30/2022]
Abstract
BACKGROUND Whole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting germline deletions. METHODS WGS data and high confidence deletion calls for the individual NA12878 from the Genome in a Bottle consortium were the benchmark dataset. The performance of BreakDancer, CNVnator, Delly, FermiKit, and Pindel was assessed by comparing the accuracy and sensitivity of each software package in detecting deletions exceeding 1 kb. RESULTS There was considerable variability in the outputs of the different WGS CNV detection programs. The best performance was seen from BreakDancer and Delly, with 92.6% and 96.7% sensitivity, respectively and 34.5% and 68.5% false discovery rate (FDR), respectively. In comparison, Pindel, CNVnator, and FermiKit were less effective with sensitivities of 69.1%, 66.0%, and 15.8%, respectively and FDR of 91.3%, 69.0%, and 31.7%, respectively. Concordance across software packages was poor, with only 27 of the total 612 benchmark deletions identified by all five methodologies. CONCLUSIONS The WGS based CNV detection tools evaluated show disparate performance in identifying deletions ≥1 kb, particularly those utilising different input data characteristics. Software that exploits read pair based data had the highest sensitivity, namely BreakDancer and Delly. BreakDancer also had the second lowest false discovery rate. Therefore, in this analysis read pair methods (BreakDancer in particular) were the best performing approaches for the identification of deletions ≥1 kb, balancing accuracy and sensitivity. There is potential for improvement in the detection algorithms, particularly for reducing FDR. This analysis has validated the utility of WGS based CNV detection software to reliably identify deletions, and these findings will be of use when choosing appropriate software for deletion detection, in both research and diagnostic medicine.
Collapse
Affiliation(s)
- Whitney Whitford
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; Centre for Brain Research, The University of Auckland, New Zealand.
| | - Klaus Lehnert
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; Centre for Brain Research, The University of Auckland, New Zealand.
| | - Russell G Snell
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; Centre for Brain Research, The University of Auckland, New Zealand.
| | - Jessie C Jacobsen
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; Centre for Brain Research, The University of Auckland, New Zealand.
| |
Collapse
|
29
|
Frankell AM, Jammula S, Li X, Contino G, Killcoyne S, Abbas S, Perner J, Bower L, Devonshire G, Ococks E, Grehan N, Mok J, O'Donovan M, MacRae S, Eldridge MD, Tavaré S, Fitzgerald RC. The landscape of selection in 551 esophageal adenocarcinomas defines genomic biomarkers for the clinic. Nat Genet 2019; 51:506-516. [PMID: 30718927 PMCID: PMC6420087 DOI: 10.1038/s41588-018-0331-5] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 12/10/2018] [Indexed: 12/24/2022]
Abstract
Esophageal adenocarcinoma (EAC) is a poor-prognosis cancer type with rapidly rising incidence. Understanding of the genetic events driving EAC development is limited, and there are few molecular biomarkers for prognostication or therapeutics. Using a cohort of 551 genomically characterized EACs with matched RNA sequencing data, we discovered 77 EAC driver genes and 21 noncoding driver elements. We identified a mean of 4.4 driver events per tumor, which were derived more commonly from mutations than copy number alterations, and compared the prevelence of these mutations to the exome-wide mutational excess calculated using non-synonymous to synonymous mutation ratios (dN/dS). We observed mutual exclusivity or co-occurrence of events within and between several dysregulated EAC pathways, a result suggestive of strong functional relationships. Indicators of poor prognosis (SMAD4 and GATA4) were verified in independent cohorts with significant predictive value. Over 50% of EACs contained sensitizing events for CDK4 and CDK6 inhibitors, which were highly correlated with clinically relevant sensitivity in a panel of EAC cell lines and organoids.
Collapse
Affiliation(s)
- Alexander M Frankell
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - SriGanesh Jammula
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Xiaodun Li
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - Gianmarco Contino
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - Sarah Killcoyne
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sujath Abbas
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - Juliane Perner
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Lawrence Bower
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Ginny Devonshire
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Emma Ococks
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - Nicola Grehan
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - James Mok
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | | | - Shona MacRae
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK
| | - Matthew D Eldridge
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Simon Tavaré
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Rebecca C Fitzgerald
- MRC cancer unit, Hutchison/MRC research Centre, University of Cambridge, Cambridge, UK.
| |
Collapse
|
30
|
P'ng C, Green J, Chong LC, Waggott D, Prokopec SD, Shamsi M, Nguyen F, Mak DYF, Lam F, Albuquerque MA, Wu Y, Jung EH, Starmans MHW, Chan-Seng-Yue MA, Yao CQ, Liang B, Lalonde E, Haider S, Simone NA, Sendorek D, Chu KC, Moon NC, Fox NS, Grzadkowski MR, Harding NJ, Fung C, Murdoch AR, Houlahan KE, Wang J, Garcia DR, de Borja R, Sun RX, Lin X, Chen GM, Lu A, Shiah YJ, Zia A, Kearns R, Boutros PC. BPG: Seamless, automated and interactive visualization of scientific data. BMC Bioinformatics 2019; 20:42. [PMID: 30665349 PMCID: PMC6341661 DOI: 10.1186/s12859-019-2610-2] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 01/04/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND We introduce BPG, a framework for generating publication-quality, highly-customizable plots in the R statistical environment. RESULTS This open-source package includes multiple methods of displaying high-dimensional datasets and facilitates generation of complex multi-panel figures, making it suitable for complex datasets. A web-based interactive tool allows online figure customization, from which R code can be downloaded for integration with computational pipelines. CONCLUSION BPG provides a new approach for linking interactive and scripted data visualization and is available at http://labs.oicr.on.ca/boutros-lab/software/bpg or via CRAN at https://cran.r-project.org/web/packages/BoutrosLab.plotting.general.
Collapse
Affiliation(s)
| | - Jeffrey Green
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Daryl Waggott
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | | | | | | | - Felix Lam
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Ying Wu
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Esther H Jung
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | | | - Cindy Q Yao
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Bianca Liang
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Emilie Lalonde
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Syed Haider
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | | | - Kenneth C Chu
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Natalie S Fox
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | | | | | - Clement Fung
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Kathleen E Houlahan
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Jianxin Wang
- Ontario Institute for Cancer Research, Toronto, Canada.,Present address: Center for Computational Research, Buffalo Institute for Genomics and Data Analytics, NYS Center for Excellence in Bioinformatics & Life Science, University at Buffalo, Buffalo, USA
| | | | | | - Ren X Sun
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada
| | - Xihui Lin
- Ontario Institute for Cancer Research, Toronto, Canada
| | | | - Aileen Lu
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada
| | - Yu-Jia Shiah
- Ontario Institute for Cancer Research, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Amin Zia
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Ryan Kearns
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Paul C Boutros
- Ontario Institute for Cancer Research, Toronto, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada. .,Department of Human Genetics, University of California, Los Angeles, USA. .,Department of Urology, University of California, Los Angeles, USA. .,Institute for Precision Health, University of California, Los Angeles, USA. .,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, USA.
| |
Collapse
|