1
|
Zhao Y, Huang F, Wang W, Gao R, Fan L, Wang A, Gao SH. Application of high-throughput sequencing technologies and analytical tools for pathogen detection in urban water systems: Progress and future perspectives. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165867. [PMID: 37516185 DOI: 10.1016/j.scitotenv.2023.165867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 07/31/2023]
Abstract
The ubiquitous presence of pathogenic microorganisms, such as viruses, bacteria, fungi, and protozoa, in urban water systems poses a significant risk to public health. The emergence of infectious waterborne diseases mediated by urban water systems has become one of the leading global causes of mortality. However, the detection and monitoring of these pathogenic microorganisms have been limited by the complexity and diversity in the environmental samples. Conventional methods were restricted by long assay time, high benchmarks of identification, and narrow application sceneries. Novel technologies, such as high-throughput sequencing technologies, enable potentially full-spectrum detection of trace pathogenic microorganisms in complex environmental matrices. This review discusses the current state of high-throughput sequencing technologies for identifying pathogenic microorganisms in urban water systems with a concise summary. Furthermore, future perspectives in pathogen research emphasize the need for detection methods with high accuracy and sensitivity, the establishment of precise detection standards and procedures, and the significance of bioinformatics software and platforms. We have compiled a list of pathogens analysis software/platforms/databases that boast robust engines and high accuracy for preference. We highlight the significance of analyses by combining targeted and non-targeted sequencing technologies, short and long reads technologies, sequencing technologies, and bioinformatic tools in pursuing upgraded biosafety in urban water systems.
Collapse
Affiliation(s)
- Yanmei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Fang Huang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Rui Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
| | - Aijie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Shu-Hong Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
| |
Collapse
|
2
|
Goubet AG. Could the tumor-associated microbiota be the new multi-faceted player in the tumor microenvironment? Front Oncol 2023; 13:1185163. [PMID: 37287916 PMCID: PMC10242102 DOI: 10.3389/fonc.2023.1185163] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 05/02/2023] [Indexed: 06/09/2023] Open
Abstract
Microorganisms have been identified in tumor specimens for over a century. It is only in recent years that tumor-associated microbiota has become a rapidly expanding field. Assessment techniques encompass methods at the frontiers of molecular biology, microbiology, and histology, requiring a transdisciplinary process to carefully decipher this new component of the tumor microenvironment. Due to the low biomass, the study of tumor-associated microbiota poses technical, analytical, biological, and clinical challenges and must be approached as a whole. To date, several studies have begun to shed light on the composition, functions, and clinical relevance of the tumor-associated microbiota. This new piece of the tumor microenvironment puzzle could potentially change the way we think about and treat patients with cancer.
Collapse
Affiliation(s)
- Anne-Gaëlle Goubet
- Department of Pathology and Immunology, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- AGORA Cancer Research Center, Lausanne, Switzerland
- Swiss Cancer Center Léman, Lausanne, Switzerland
| |
Collapse
|
3
|
Turan H, Vitale SG, Kahramanoglu I, Della Corte L, Giampaolino P, Azemi A, Durmus S, Sal V, Tokgozoglu N, Bese T, Arvas M, Demirkiran F, Gelisgen R, Ilvan S, Uzun H. Diagnostic and prognostic role of TFF3, Romo-1, NF-кB and SFRP4 as biomarkers for endometrial and ovarian cancers: a prospective observational translational study. Arch Gynecol Obstet 2022; 306:2105-2114. [PMID: 35461390 PMCID: PMC9633503 DOI: 10.1007/s00404-022-06563-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 04/01/2022] [Indexed: 12/24/2022]
Abstract
Purpose This study aimed to evaluate trefoil factor 3 (TFF3), secreted frizzled-related protein 4 (sFRP4), reactive oxygen species modulator 1 (Romo1) and nuclear factor kappa B (NF-κB) as diagnostic and prognostic markers of endometrial cancer (EC) and ovarian cancer (OC). Methods Thirty-one patients with EC and 30 patients with OC undergone surgical treatment were enrolled together with 30 healthy controls in a prospective study. Commercial ELISA kits determined serum TFF-3, Romo-1, NF-кB and sFRP-4 concentrations. Results Serum TFF-3, Romo-1 and NF-кB levels were significantly higher in patients with EC and OC than those without cancer. Regarding EC, none of the serum biomarkers differs significantly between endometrial and non-endometrioid endometrial carcinomas. Mean serum TFF-3 and NF-кB levels were significantly higher in advanced stages. Increased serum levels of TFF-3 and NF-кB were found in those with a higher grade of the disease. Regarding OC, none of the serum biomarkers differed significantly among histological subtypes. Significantly increased serum levels of NF-кB were observed in patients with advanced-stage OC than those with stage I and II diseases. No difference in serum biomarker levels was found between those who had a recurrence and those who had not. The sensibility and specificity of these four biomarkers in discriminating EC and OC from the control group showed encouraging values, although no one reached 70%. Conclusions TFF-3, Romo-1, NF-кB and SFRP4 could represent new diagnostic and prognostic markers for OC and EC. Further studies are needed to validate our results.
Collapse
Affiliation(s)
- Hasan Turan
- Department of Gynecologic Oncology, Health Science University, Cam Sakura Training and Research Hospital, Istanbul, Turkey
| | - Salvatore Giovanni Vitale
- Obstetrics and Gynecology Unit, Department of General Surgery and Medical Surgical Specialties, University of Catania, Via Santa Sofia 78, 95123, Catania, Italy.
| | | | - Luigi Della Corte
- Department of Neuroscience, Reproductive Sciences and Dentistry, School of Medicine, University of Naples, Naples, Italy
| | - Pierluigi Giampaolino
- Department of Public Health, University of Naples Federico II, Via Sergio Pansini, Naples, Italy
| | - Asli Azemi
- Department of Biochemistry, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Sinem Durmus
- Department of Biochemistry, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Veysel Sal
- Department of Obstetrics and Gynecology, Memorial Bahcelievler Hospital, Istanbul, Turkey
| | - Nedim Tokgozoglu
- Department of Gynecologic Oncology, Okmeydanı Training and Research Hospital, Istanbul, Turkey
| | - Tugan Bese
- Department of Gynecologic Oncology, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Macit Arvas
- Department of Gynecologic Oncology, American Hospital, Istanbul, Turkey
| | - Fuat Demirkiran
- Department of Gynecologic Oncology, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Remise Gelisgen
- Department of Biochemistry, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Sennur Ilvan
- Department of Pathology, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Hafize Uzun
- Department of Biochemistry, School of Medicine, Istanbul University-Cerrahpasa, Istanbul, Turkey
| |
Collapse
|
4
|
Yu D, Wang T, Liang D, Mei Y, Zou W, Guo S. The Landscape of Microbial Composition and Associated Factors in Pancreatic Ductal Adenocarcinoma Using RNA-Seq Data. Front Oncol 2021; 11:651350. [PMID: 34136388 PMCID: PMC8202409 DOI: 10.3389/fonc.2021.651350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/30/2021] [Indexed: 01/14/2023] Open
Abstract
Recent research studies on interrogation of the tumor microbiome (including bacteria, viruses, and fungi) have yielded important insights into the role of microbes in carcinogenesis, therapeutic responses, and resistance. Once thought to be a sterile organ, a number of studies have showed the presence of microbes within this organ in PDAC status. A microbiome–pancreas axis for PDAC (pancreatic ductal adenocarcinoma) carcinogenesis is proposed. However, the microbial composition of localized PDAC tissue is still unclear. The associations between microbiome and PDAC reported in previous studies were detected in an indirect way, which mostly used samples from stool, oral saliva, and intestinal samples. This study integrated 582 samples derived from PDAC tissues across four datasets and presented a landscape of tumor microbiome at the genus level in PDAC based on remining of RNA-Seq data. On average, there are hundreds of genera distributed in the PDAC tissue, and dozens of core microbiota were identified by PDAC tissue. The pan-microbiome of PDAC tissue was also estimated, which might surpass 2,500 genera. In addition, sampling sites (stroma vs. epithelium) and tissue source (human tissue vs. PDX) were found to have great effects on the microbial composition of PDAC tissue, but not the traditional risk factors (sex and age). It is the first study to systematically focus on exploring the microbial composition of PDAC tissue and is helpful to have a deep understanding of tumor microbiome. The identified specific taxa might be potential biomarkers for follow-up research studies.
Collapse
Affiliation(s)
- Dong Yu
- Center of Translational Medicine, Second Military Medical University, Shanghai, China.,Shanghai Key Laboratory of Cell Engineering, Shanghai, China
| | - Tengjiao Wang
- Center of Translational Medicine, Second Military Medical University, Shanghai, China.,Shanghai Key Laboratory of Cell Engineering, Shanghai, China
| | - Dong Liang
- Center of Translational Medicine, Second Military Medical University, Shanghai, China.,Shanghai Key Laboratory of Cell Engineering, Shanghai, China
| | - Yue Mei
- Center of Translational Medicine, Second Military Medical University, Shanghai, China.,Shanghai Key Laboratory of Cell Engineering, Shanghai, China
| | - Wenbin Zou
- Department of Gastroenterology, Changhai Hospital, Second Military Medical University, Shanghai, China
| | - Shiwei Guo
- Department of General Surgery, Changhai Hospital, Second Military Medical University, Shanghai, China
| |
Collapse
|
5
|
Rodriguez RM, Khadka VS, Menor M, Hernandez BY, Deng Y. Tissue-associated microbial detection in cancer using human sequencing data. BMC Bioinformatics 2020; 21:523. [PMID: 33272199 PMCID: PMC7713026 DOI: 10.1186/s12859-020-03831-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/19/2022] Open
Abstract
Cancer is one of the leading causes of morbidity and mortality in the globe. Microbiological infections account for up to 20% of the total global cancer burden. The human microbiota within each organ system is distinct, and their compositional variation and interactions with the human host have been known to attribute detrimental and beneficial effects on tumor progression. With the advent of next generation sequencing (NGS) technologies, data generated from NGS is being used for pathogen detection in cancer. Numerous bioinformatics computational frameworks have been developed to study viral information from host-sequencing data and can be adapted to bacterial studies. This review highlights existing popular computational frameworks that utilize NGS data as input to decipher microbial composition, which output can predict functional compositional differences with clinically relevant applicability in the development of treatment and prevention strategies.
Collapse
Affiliation(s)
- Rebecca M. Rodriguez
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
- NIDDK Central Repository, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, USA
| | - Vedbar S. Khadka
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Mark Menor
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Brenda Y. Hernandez
- Epidemiology, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
| | - Youping Deng
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| |
Collapse
|
6
|
Chen X, Kost J, Li D. Comprehensive comparative analysis of methods and software for identifying viral integrations. Brief Bioinform 2020; 20:2088-2097. [PMID: 30102374 DOI: 10.1093/bib/bby070] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022] Open
Abstract
Many viruses are capable of integrating in the human genome, particularly viruses involved in tumorigenesis. Viral integrations can be considered genetic markers for discovering virus-caused cancers and inferring cancer cell development. Next-generation sequencing (NGS) technologies have been widely used to screen for viral integrations in cancer genomes, and a number of bioinformatics tools have been developed to detect viral integrations using NGS data. However, there has been no systematic comparison of the methods or software. In this study, we performed a comprehensive comparative analysis of the designs, performance, functionality and limitations among the existing methods and software for detecting viral integrations. We further compared the sensitivity, precision and runtime of integration detection of four representative tools. Our analyses showed that each of the existing software had its own merits; however, none of them were sufficient for parallel or accurate virome-wide detection. After carefully evaluating the limitations shared by the existing methods, we proposed strategies and directions for developing virome-wide integration detection.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Cancer Center, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
7
|
Robitaille A, Brancaccio RN, Dutta S, Rollison DE, Leja M, Fischer N, Grundhoff A, Gheit T, Tommasino M, Olivier M. PVAmpliconFinder: a workflow for the identification of human papillomaviruses from high-throughput amplicon sequencing. BMC Bioinformatics 2020; 21:233. [PMID: 32513098 PMCID: PMC7282039 DOI: 10.1186/s12859-020-03573-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 05/28/2020] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The detection of known human papillomaviruses (PVs) from targeted wet-lab approaches has traditionally used PCR-based methods coupled with Sanger sequencing. With the introduction of next-generation sequencing (NGS), these approaches can be revisited to integrate the sequencing power of NGS. Although computational tools have been developed for metagenomic approaches to search for known or novel viruses in NGS data, no appropriate tool is available for the classification and identification of novel viral sequences from data produced by amplicon-based methods. RESULTS We have developed PVAmpliconFinder, a data analysis workflow designed to rapidly identify and classify known and potentially new Papillomaviridae sequences from NGS amplicon sequencing with degenerate PV primers. Here, we describe the features of PVAmpliconFinder and its implementation using biological data obtained from amplicon sequencing of human skin swab specimens and oral rinses from healthy individuals. CONCLUSIONS PVAmpliconFinder identified putative new HPV sequences, including one that was validated by wet-lab experiments. PVAmpliconFinder can be easily modified and applied to other viral families. PVAmpliconFinder addresses a gap by providing a solution for the analysis of NGS amplicon sequencing, increasingly used in clinical research. The PVAmpliconFinder workflow, along with its source code, is freely available on the GitHub platform: https://github.com/IARCbioinfo/PVAmpliconFinder.
Collapse
Affiliation(s)
| | | | - Sankhadeep Dutta
- International Agency for Research on Cancer, Lyon, France
- Department of Oncogene Regulation, Chittaranjan National Cancer Institute, Kolkata, India
| | - Dana E Rollison
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, Florida, USA
| | - Marcis Leja
- Institute of Clinical and Preventive Medicine, University of Latvia, Riga, Latvia
| | - Nicole Fischer
- German Center for Infection Research, Hamburg-Borstel-Lübeck-Riems, Hamburg, Germany
- Institute for Medical Microbiology, Virology and Hygiene, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Adam Grundhoff
- German Center for Infection Research, Hamburg-Borstel-Lübeck-Riems, Hamburg, Germany
- Heinrich Pette Institut, Leibniz Institut for Experimental Virology, Hamburg, Germany
| | - Tarik Gheit
- International Agency for Research on Cancer, Lyon, France
| | | | - Magali Olivier
- International Agency for Research on Cancer, Lyon, France.
| |
Collapse
|
8
|
Rodriguez RM, Hernandez BY, Menor M, Deng Y, Khadka VS. The landscape of bacterial presence in tumor and adjacent normal tissue across 9 major cancer types using TCGA exome sequencing. Comput Struct Biotechnol J 2020; 18:631-641. [PMID: 32257046 PMCID: PMC7109368 DOI: 10.1016/j.csbj.2020.03.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 03/02/2020] [Accepted: 03/06/2020] [Indexed: 12/26/2022] Open
Abstract
Identification of microbial composition directly from tumor tissue permits studying the relationship between microbial changes and cancer pathogenesis. We interrogated bacterial presence in tumor and adjacent normal tissue strictly in pairs utilizing human whole exome sequencing to generate microbial profiles. Profiles were generated for 813 cases from stomach, liver, colon, rectal, lung, head & neck, cervical and bladder TCGA cohorts. Core microbiota examination revealed twelve taxa to be common across the nine cancer types at all classification levels. Paired analyses demonstrated significant differences in bacterial shifts between tumor and adjacent normal tissue across stomach, colon, lung squamous cell, and head & neck cohorts, whereas little or no differences were evident in liver, rectal, lung adenocarcinoma, cervical and bladder cancer cohorts in adjusted models. Helicobacter pylori in stomach and Bacteroides vulgatus in colon were found to be significantly higher in adjacent normal compared to tumor tissue after false discovery rate correction. Computational results were validated with tissue from an independent population by species-specific qPCR showing similar patterns of co-occurrence among Fusobacterium nucleatum and Selenomonas sputigena in gastric samples. This study demonstrates the ability to identify bacteria differential composition derived from human tissue whole exome sequences. Taken together our results suggest the microbial profiles shift with advanced disease and that the microbial composition of the adjacent tissue can be indicative of cancer stage disease progression.
Collapse
Key Words
- BLCA, bladder carcinoma
- CESC, cervical & endocervical squamous cell carcinomas
- COAD, colon adenocarcinoma
- COREAD, colon and rectal adenocarcinoma TCGA cohorts
- Cancer microbiome
- Exome sequencing
- HNSC, head & neck squamous cell carcinoma
- L2FC, log 2 fold change
- LIHC, liver hepatocellular carcinoma
- LUAD, lung adenocarcinoma
- LUSC, lung squamous cell carcinoma
- Microbial landscape
- READ, rectal adenocarcinoma
- STAD, stomach adenocarcinoma
- TCGA
- TCGA, The Cancer Genome Atlas
Collapse
Affiliation(s)
- Rebecca M. Rodriguez
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii Mānoa, Honolulu, HI, United States
- Population Sciences in the Pacific Program-Cancer Epidemiology, University of Hawaii Cancer Center, Honolulu, HI, United States
| | - Brenda Y. Hernandez
- Epidemiology, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI, United States
- Population Sciences in the Pacific Program-Cancer Epidemiology, University of Hawaii Cancer Center, Honolulu, HI, United States
| | - Mark Menor
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii Mānoa, Honolulu, HI, United States
| | - Youping Deng
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii Mānoa, Honolulu, HI, United States
| | - Vedbar S. Khadka
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii Mānoa, Honolulu, HI, United States
| |
Collapse
|
9
|
Zapatka M, Borozan I, Brewer DS, Iskar M, Grundhoff A, Alawi M, Desai N, Sültmann H, Moch H, Cooper CS, Eils R, Ferretti V, Lichter P. The landscape of viral associations in human cancers. Nat Genet 2020; 52:320-330. [PMID: 32025001 PMCID: PMC8076016 DOI: 10.1038/s41588-019-0558-9] [Citation(s) in RCA: 220] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 11/22/2019] [Indexed: 12/30/2022]
Abstract
Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, for which whole-genome and-for a subset-whole-transcriptome sequencing data from 2,658 cancers across 38 tumor types was aggregated, we systematically investigated potential viral pathogens using a consensus approach that integrated three independent pipelines. Viruses were detected in 382 genome and 68 transcriptome datasets. We found a high prevalence of known tumor-associated viruses such as Epstein-Barr virus (EBV), hepatitis B virus (HBV) and human papilloma virus (HPV; for example, HPV16 or HPV18). The study revealed significant exclusivity of HPV and driver mutations in head-and-neck cancer and the association of HPV with APOBEC mutational signatures, which suggests that impaired antiviral defense is a driving force in cervical, bladder and head-and-neck carcinoma. For HBV, HPV16, HPV18 and adeno-associated virus-2 (AAV2), viral integration was associated with local variations in genomic copy numbers. Integrations at the TERT promoter were associated with high telomerase expression evidently activating this tumor-driving process. High levels of endogenous retrovirus (ERV1) expression were linked to a worse survival outcome in patients with kidney cancer.
Collapse
Affiliation(s)
- Marc Zapatka
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ivan Borozan
- Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Daniel S Brewer
- Norwich Medical School, University of East Anglia, Norwich, UK
- Earlham Institute, Norwich, UK
| | - Murat Iskar
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Adam Grundhoff
- Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Borstel-Lübeck-Riems, Hamburg, Germany
| | - Malik Alawi
- Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
- Bioinformatics Core, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Nikita Desai
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
- Biomedical Data Science Laboratory, Francis Crick Institute, London, UK
| | - Holger Sültmann
- National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Holger Moch
- Department of Pathology and Molecular Pathology, University and University Hospital Zürich, Zurich, Switzerland
| | - Colin S Cooper
- Norwich Medical School, University of East Anglia, Norwich, UK
- Earlham Institute, Norwich, UK
- Institute of Cancer Research, London, UK
- University of East Anglia, Norwich, UK
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology, Heidelberg University and BioQuant Center, Heidelberg, Germany
- Center for Digital Health, Berlin Institute of Health and Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Vincent Ferretti
- Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Québec, Canada
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| |
Collapse
|
10
|
Sangiovanni M, Granata I, Thind AS, Guarracino MR. From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics 2019; 20:168. [PMID: 30999839 PMCID: PMC6472186 DOI: 10.1186/s12859-019-2684-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This ’misalignment’ can be ascribed to low quality bases or sequence differences between the sample reads and the reference genome. Investigating the source of the unmapped reads is definitely important to better assess the quality of the whole experiment and to check for possible downstream or upstream ’contamination’ from exogenous nucleic acids. Results Here we propose DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach to identify bacteria, fungi and viruses genome contamination. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. DecontaMiner builds an offline HTML page containing summary statistics and plots. The latter are obtained using the state-of-the-art D3 javascript libraries. DecontaMiner has been mainly used to detect contamination in human RNA-Seq data. The software is freely available at http://www-labgtp.na.icar.cnr.it/decontaminer. Conclusions DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline. Electronic supplementary material The online version of this article (10.1186/s12859-019-2684-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mara Sangiovanni
- Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Ilaria Granata
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy.
| | - Amarinder Singh Thind
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy
| | - Mario Rosario Guarracino
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy
| |
Collapse
|
11
|
Chen X, Kost J, Sulovari A, Wong N, Liang WS, Cao J, Li D. A virome-wide clonal integration analysis platform for discovering cancer viral etiology. Genome Res 2019; 29:819-830. [PMID: 30872350 PMCID: PMC6499315 DOI: 10.1101/gr.242529.118] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 03/11/2019] [Indexed: 12/31/2022]
Abstract
Oncoviral infection is responsible for 12%–15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Nathalie Wong
- Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong 999077, P.R. China
| | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | - Jian Cao
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA.,Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
12
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
13
|
Tang KW, Larsson E. Tumour virology in the era of high-throughput genomics. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0265. [PMID: 28893932 PMCID: PMC5597732 DOI: 10.1098/rstb.2016.0265] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 12/12/2022] Open
Abstract
With the advent of massively parallel sequencing, oncogenic viruses in tumours can now be detected in an unbiased and comprehensive manner. Additionally, new viruses or strains can be discovered based on sequence similarity with known viruses. Using this approach, the causative agent for Merkel cell carcinoma was identified. Subsequent studies using data from large collections of tumours have confirmed models built during decades of hypothesis-driven and low-throughput research, and a more detailed and comprehensive description of virus-tumour associations have emerged. Notably, large cohorts and high sequencing depth, in combination with newly developed bioinformatical techniques, have made it possible to rule out several suggested virus-tumour associations with a high degree of confidence. In this review we discuss possibilities, limitations and insights gained from using massively parallel sequencing to characterize tumours with viral content, with emphasis on detection of viral sequences and genomic integration events.This article is part of the themed issue 'Human oncogenic viruses'.
Collapse
Affiliation(s)
- Ka-Wei Tang
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| | - Erik Larsson
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| |
Collapse
|
14
|
Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma. J Virol 2018; 92:JVI.01239-17. [PMID: 29093097 DOI: 10.1128/jvi.01239-17] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 10/05/2017] [Indexed: 01/10/2023] Open
Abstract
Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis.IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis.
Collapse
|
15
|
Cantalupo PG, Katz JP, Pipas JM. Viral sequences in human cancer. Virology 2017; 513:208-216. [PMID: 29107929 DOI: 10.1016/j.virol.2017.10.017] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 10/10/2017] [Accepted: 10/19/2017] [Indexed: 01/14/2023]
Abstract
We have developed a virus detection and discovery computational pipeline, Pickaxe, and applied it to NGS databases provided by The Cancer Genome Atlas (TCGA). We analyzed a collection of whole genome (WGS), exome (WXS), and RNA (RNA-Seq) sequencing libraries from 3052 participants across 22 different cancers. NGS data from nearly all tumor and normal tissues examined contained contaminating viral sequences. Intensive computational and manual efforts are required to remove these artifacts. We found that several different types of cancers harbored Herpesviruses including EBV, CMV, HHV1, HHV2, HHV6 and HHV7. In addition to the reported associations of Hepatitis B and C virus (HBV & HCV) with liver cancer, and Human papillomaviruses (HPV) with cervical cancer and a subset of head and neck cancers, we found additional cases of HPV integrated in a small number of bladder cancers. Gene expression and mutational profiles suggest that HPV drives tumorigenesis in these cases.
Collapse
Affiliation(s)
- Paul G Cantalupo
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Joshua P Katz
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - James M Pipas
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
16
|
Brhelova E, Antonova M, Pardy F, Kocmanova I, Mayer J, Racil Z, Lengerova M. Investigation of next-generation sequencing data of Klebsiella pneumoniae using web-based tools. J Med Microbiol 2017; 66:1673-1683. [PMID: 29068275 DOI: 10.1099/jmm.0.000624] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
PURPOSE Rapid identification and characterization of multidrug-resistant Klebsiella pneumoniae strains is necessary due to the increasing frequency of severe infections in patients. The decreasing cost of next-generation sequencing enables us to obtain a comprehensive overview of genetic information in one step. The aim of this study is to demonstrate and evaluate the utility and scope of the application of web-based databases to next-generation sequenced (NGS) data. METHODOLOGY The whole genomes of 11 clinical Klebsiella pneumoniae isolates were sequenced using Illumina MiSeq. Selected web-based tools were used to identify a variety of genetic characteristics, such as acquired antimicrobial resistance genes, multilocus sequence types, plasmid replicons, and identify virulence factors, such as virulence genes, cps clusters, urease-nickel clusters and efflux systems. RESULTS Using web-based tools hosted by the Center for Genomic Epidemiology, we detected resistance to 8 main antimicrobial groups with at least 11 acquired resistance genes. The isolates were divided into eight sequence types (ST11, 23, 37, 323, 433, 495 and 562, and a new one, ST1646). All of the isolates carried replicons of large plasmids. Capsular types, virulence factors and genes coding AcrAB and OqxAB efflux pumps were detected using BIGSdb-Kp, whereas the selected virulence genes, identified in almost all of the isolates, were detected using CLC Genomic Workbench software. CONCLUSION Applying appropriate web-based online tools to NGS data enables the rapid extraction of comprehensive information that can be used for more efficient diagnosis and treatment of patients, while data processing is free of charge, easy and time-efficient.
Collapse
Affiliation(s)
- Eva Brhelova
- Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Mariya Antonova
- Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Filip Pardy
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Iva Kocmanova
- Department of Clinical Microbiology, University Hospital Brno, Brno, Czech Republic
| | - Jiri Mayer
- Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Zdenek Racil
- Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Martina Lengerova
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
17
|
Doggett NA, Mukundan H, Lefkowitz EJ, Slezak TR, Chain PS, Morse S, Anderson K, Hodge DR, Pillai S. Culture-Independent Diagnostics for Health Security. Health Secur 2017; 14:122-42. [PMID: 27314653 DOI: 10.1089/hs.2015.0074] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The past decade has seen considerable development in the diagnostic application of nonculture methods, including nucleic acid amplification-based methods and mass spectrometry, for the diagnosis of infectious diseases. The implications of these new culture-independent diagnostic tests (CIDTs) include bypassing the need to culture organisms, thus potentially affecting public health surveillance systems, which continue to use isolates as the basis of their surveillance programs and to assess phenotypic resistance to antimicrobial agents. CIDTs may also affect the way public health practitioners detect and respond to a bioterrorism event. In response to a request from the Department of Homeland Security, Los Alamos National Laboratory and the Centers for Disease Control and Prevention cosponsored a workshop to review the impact of CIDTs on the rapid detection and identification of biothreat agents. Four panel discussions were held that covered nucleic acid amplification-based diagnostics, mass spectrometry, antibody-based diagnostics, and next-generation sequencing. Exploiting the extensive expertise available at this workshop, we identified the key features, benefits, and limitations of the various CIDT methods for providing rapid pathogen identification that are critical to the response and mitigation of a bioterrorism event. After the workshop we conducted a thorough review of the literature, investigating the current state of these 4 culture-independent diagnostic methods. This article combines information from the literature review and the insights obtained at the workshop.
Collapse
|
18
|
VirusSeeker, a computational pipeline for virus discovery and virome composition analysis. Virology 2017; 503:21-30. [PMID: 28110145 DOI: 10.1016/j.virol.2017.01.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 01/07/2017] [Accepted: 01/10/2017] [Indexed: 01/21/2023]
Abstract
The advent of Next Generation Sequencing (NGS) has vastly increased our ability to discover novel viruses and to systematically define the spectrum of viruses present in a given specimen. Such studies have led to the discovery of novel viral pathogens as well as broader associations of the virome with diverse diseases including inflammatory bowel disease, severe acute malnutrition and HIV/AIDS. Critical to the success of these efforts are robust bioinformatic pipelines for rapid classification of microbial sequences. Existing computational tools are typically focused on either eukaryotic virus discovery or virome composition analysis but not both. Here we present VirusSeeker, a BLAST-based NGS data analysis pipeline designed for both purposes. VirusSeeker has been successfully applied in several previously published virome studies. Here we demonstrate the functionality of VirusSeeker in both novel virus discovery and virome composition analysis.
Collapse
|
19
|
Bullman S, Meyerson M, Kostic AD. Emerging Concepts and Technologies for the Discovery of Microorganisms Involved in Human Disease. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2016; 12:217-244. [PMID: 27959634 DOI: 10.1146/annurev-pathol-012615-044305] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Established infectious agents continue to be a major cause of human morbidity and mortality worldwide. However, the causative agent remains unknown for a wide range of diseases; many of these are suspected to be attributable to yet undiscovered microorganisms. The advent of unbiased high-throughput sequencing and bioinformatics has enabled rapid identification and molecular characterization of known and novel infectious agents in human disease. An exciting era of microbe discovery, now under way, holds great promise for the improvement of global health via the development of antimicrobial therapies, vaccination strategies, targeted public health measures, and probiotic-based preventions and therapies. Here, we review the history of pathogen discovery, discuss improvements and clinical applications for the detection of microbially associated diseases, and explore the challenges and strategies for establishing causation in human disease.
Collapse
Affiliation(s)
- Susan Bullman
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Matthew Meyerson
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142.,Harvard Medical School, Boston, Massachusetts 02115
| | - Aleksandar D Kostic
- Research Division, Joslin Diabetes Center, Boston, Massachusetts 02215; .,Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
20
|
Karapiperis C, Kempf SJ, Quintens R, Azimzadeh O, Vidal VL, Pazzaglia S, Bazyka D, Mastroberardino PG, Scouras ZG, Tapio S, Benotmane MA, Ouzounis CA. Brain Radiation Information Data Exchange (BRIDE): integration of experimental data from low-dose ionising radiation research for pathway discovery. BMC Bioinformatics 2016; 17:212. [PMID: 27170263 PMCID: PMC4865096 DOI: 10.1186/s12859-016-1068-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 04/21/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The underlying molecular processes representing stress responses to low-dose ionising radiation (LDIR) in mammals are just beginning to be understood. In particular, LDIR effects on the brain and their possible association with neurodegenerative disease are currently being explored using omics technologies. RESULTS We describe a light-weight approach for the storage, analysis and distribution of relevant LDIR omics datasets. The data integration platform, called BRIDE, contains information from the literature as well as experimental information from transcriptomics and proteomics studies. It deploys a hybrid, distributed solution using both local storage and cloud technology. CONCLUSIONS BRIDE can act as a knowledge broker for LDIR researchers, to facilitate molecular research on the systems biology of LDIR response in mammals. Its flexible design can capture a range of experimental information for genomics, epigenomics, transcriptomics, and proteomics. The data collection is available at: .
Collapse
Affiliation(s)
- Christos Karapiperis
- Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessalonica, 54124, Thessalonica, Greece
| | - Stefan J Kempf
- Institute of Radiation Biology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
- Present address: Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Roel Quintens
- Radiobiology Unit, Belgian Nuclear Research Centre (SCK•CEN), B-2400, Mol, Belgium
| | - Omid Azimzadeh
- Institute of Radiation Biology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Victoria Linares Vidal
- School of Medicine, IISPV, "Rovira i Virgili" University, Sant Llorens 21, 43201, Reus, Spain
| | - Simonetta Pazzaglia
- Laboratory of Radiation Biology & Biomedicine, Agenzia Nazionale per le Nuove Tecnologie, l'Energia e lo Sviluppo Economico Sostenibile (ENEA) Centro Ricerche Casaccia, 00123, Rome, Italy
| | - Dimitry Bazyka
- National Research Center for Radiation Medicine of the National Academy of Medical Sciences of Ukraine, Melnykov str. 53, Kyiv, 04050, Ukraine
| | | | - Zacharias G Scouras
- Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessalonica, 54124, Thessalonica, Greece
| | - Soile Tapio
- Institute of Radiation Biology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany.
| | | | - Christos A Ouzounis
- Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessalonica, 54124, Thessalonica, Greece.
- Biological Process & Computation Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), Thessalonica, 57001, Greece.
| |
Collapse
|
21
|
Friis-Nielsen J, Kjartansdóttir KR, Mollerup S, Asplund M, Mourier T, Jensen RH, Hansen TA, Rey-Iglesia A, Richter SR, Nielsen IB, Alquezar-Planas DE, Olsen PVS, Vinner L, Fridholm H, Nielsen LP, Willerslev E, Sicheritz-Pontén T, Lund O, Hansen AJ, Izarzugaza JMG, Brunak S. Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers. Viruses 2016; 8:E53. [PMID: 26907326 PMCID: PMC4776208 DOI: 10.3390/v8020053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 01/29/2016] [Accepted: 02/05/2016] [Indexed: 12/17/2022] Open
Abstract
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.
Collapse
Affiliation(s)
- Jens Friis-Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Kristín Rós Kjartansdóttir
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Sarah Mollerup
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Maria Asplund
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Tobias Mourier
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Randi Holm Jensen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Arn Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Alba Rey-Iglesia
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Stine Raith Richter
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Ida Broman Nielsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - David E Alquezar-Planas
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Pernille V S Olsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lasse Vinner
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Helena Fridholm
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lars Peter Nielsen
- Department of Autoimmunology and Biomarkers, Statens Serum Institut, DK-2300 Copenhagen S, Denmark.
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Sicheritz-Pontén
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Anders Johannes Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Jose M G Izarzugaza
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Søren Brunak
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
- NNF Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark.
| |
Collapse
|
22
|
Reisman S, Hatzopoulos T, Läufer K, Thiruvathukal GK, Putonti C. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1. Evol Bioinform Online 2016; 12:23-7. [PMID: 26819543 PMCID: PMC4718148 DOI: 10.4137/ebo.s32757] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2015] [Revised: 10/18/2015] [Accepted: 10/25/2015] [Indexed: 02/04/2023] Open
Abstract
As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest.
Collapse
Affiliation(s)
- Steven Reisman
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.; Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.; Department of Biology, Loyola University Chicago, Chicago, IL, USA
| | - Thomas Hatzopoulos
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.; Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| | - Konstantin Läufer
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.; Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| | - George K Thiruvathukal
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.; Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| | - Catherine Putonti
- Bioinformatics Program, Loyola University Chicago, Chicago, IL, USA.; Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.; Department of Biology, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
23
|
Possible Human Papillomavirus 38 Contamination of Endometrial Cancer RNA Sequencing Samples in The Cancer Genome Atlas Database. J Virol 2015; 89:8967-73. [PMID: 26085148 DOI: 10.1128/jvi.00822-15] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/09/2015] [Indexed: 12/17/2022] Open
Abstract
UNLABELLED Viruses are causally associated with a number of human malignancies. In this study, we sought to identify new virus-cancer associations by searching RNA sequencing data sets from >2,000 patients, encompassing 21 cancers from The Cancer Genome Atlas (TCGA), for the presence of viral sequences. In agreement with previous studies, we found human papillomavirus 16 (HPV16) and HPV18 in oropharyngeal cancer and hepatitis B and C viruses in liver cancer. Unexpectedly, however, we found HPV38, a cutaneous form of HPV associated with skin cancer, in 32 of 168 samples from endometrial cancer. In 12 of the HPV38-positive (HPV38(+)) samples, we observed at least one paired read that mapped to both human and HPV38 genomes, indicative of viral integration into the host DNA, something not previously demonstrated for HPV38. The expression levels of HPV38 transcripts were relatively low, and all 32 HPV38(+) samples belonged to the same experimental batch of 40 samples, whereas none of the other 128 endometrial carcinoma samples were HPV38(+), raising doubts about the significance of the HPV38 association. Moreover, the HPV38(+) samples contained the same 10 novel single nucleotide variations (SNVs), leading us to hypothesize that one patient was infected with this new isolate of HPV38, which was integrated into his/her genome and may have cross-contaminated other TCGA samples within batch 228. Based on our analysis, we propose guidelines to examine the batch effect, virus expression level, and SNVs as part of next-generation sequencing (NGS) data analysis for evaluating the significance of viral/pathogen sequences in clinical samples. IMPORTANCE High-throughput RNA sequencing (RNA-Seq), followed by computational analysis, has vastly accelerated the identification of viral and other pathogenic sequences in clinical samples, but cross-contamination during the processing of the samples remain a major problem that can lead to erroneous conclusions. We found HPV38 sequences specifically present in RNA-Seq samples from endometrial cancer patients from TCGA, a virus not previously associated with this type of cancer. However, multiple lines of evidence suggest possible cross-contamination in these samples, which were processed together in the same batch. Despite this potential cross-contamination, our data indicate that we have detected a new isolate of HPV38 that appears to be integrated into the human genome. We also provide general guidelines for computational detection and interpretation of pathogen-disease associations.
Collapse
|
24
|
HeLa nucleic acid contamination in the cancer genome atlas leads to the misidentification of human papillomavirus 18. J Virol 2015; 89:4051-7. [PMID: 25631090 DOI: 10.1128/jvi.03365-14] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
UNLABELLED We searched The Cancer Genome Atlas (TCGA) database for viruses by comparing non-human reads present in transcriptome sequencing (RNA-Seq) and whole-exome sequencing (WXS) data to viral sequence databases. Human papillomavirus 18 (HPV18) is an etiologic agent of cervical cancer, and as expected, we found robust expression of HPV18 genes in cervical cancer samples. In agreement with previous studies, we also found HPV18 transcripts in non-cervical cancer samples, including those from the colon, rectum, and normal kidney. However, in each of these cases, HPV18 gene expression was low, and single-nucleotide variants and positions of genomic alignments matched the integrated portion of HPV18 present in HeLa cells. Chimeric reads that match a known virus-cell junction of HPV18 integrated in HeLa cells were also present in some samples. We hypothesize that HPV18 sequences in these non-cervical samples are due to nucleic acid contamination from HeLa cells. This finding highlights the problems that contamination presents in computational virus detection pipelines. IMPORTANCE Viruses associated with cancer can be detected by searching tumor sequence databases. Several studies involving searches of the TCGA database have reported the presence of HPV18, a known cause of cervical cancer, in a small number of additional cancers, including those of the rectum, kidney, and colon. We have determined that the sequences related to HPV18 in non-cervical samples are due to nucleic acid contamination from HeLa cells. To our knowledge, this is the first report of the misidentification of viruses in next-generation sequencing data of tumors due to contamination with a cancer cell line. These results raise awareness of the difficulty of accurately identifying viruses in human sequence databases.
Collapse
|
25
|
Wang Q, Jia P, Zhao Z. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization. Genome Med 2015; 7:2. [PMID: 25699093 PMCID: PMC4333248 DOI: 10.1186/s13073-015-0126-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/05/2015] [Indexed: 12/28/2022] Open
Abstract
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| |
Collapse
|
26
|
Calistri A, Palu G. Editorial Commentary: Unbiased Next-Generation Sequencing and New Pathogen Discovery: Undeniable Advantages and Still-Existing Drawbacks. Clin Infect Dis 2015; 60:889-91. [DOI: 10.1093/cid/ciu913] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
27
|
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J, Miller S, Chiu CY. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 2014; 24:1180-92. [PMID: 24899342 PMCID: PMC4079973 DOI: 10.1101/gr.171934.113] [Citation(s) in RCA: 311] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.
Collapse
Affiliation(s)
- Samia N Naccache
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Scot Federman
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Narayanan Veeraraghavan
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Matei Zaharia
- Department of Computer Science, University of California, Berkeley, California 94720, USA
| | - Deanna Lee
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Erik Samayoa
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Jerome Bouquet
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | | | - Ka-Cheung Luk
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Barryett Enge
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Debra A Wadford
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Sharon L Messenger
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Gillian L Genrich
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - Kristen Pellegrino
- Department of Family and Community Medicine, UCSF, San Francisco, California 94143, USA
| | - Gilda Grard
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | - Eric Leroy
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | | | - Joseph N Fair
- Metabiota, Inc., San Francisco, California 94104, USA
| | - Miguel A Martínez
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - Pavel Isa
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - John A Crump
- Division of Infectious Diseases and International Health and the Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina 27708, USA; Kilimanjaro Christian Medical Centre, Moshi, Kilimanjaro, 7393, Tanzania; Centre for International Health, University of Otago, Dunedin, 9054, New Zealand
| | - Joseph L DeRisi
- Department of Biochemistry, UCSF, San Francisco, California 94107, USA
| | - Taylor Sittler
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - John Hackett
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Steve Miller
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Charles Y Chiu
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA; Department of Medicine, Division of Infectious Diseases, UCSF, San Francisco, California 94143, USA
| |
Collapse
|
28
|
Caboche S, Audebert C, Hot D. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology. Pathogens 2014; 3:258-79. [PMID: 25437800 PMCID: PMC4243446 DOI: 10.3390/pathogens3020258] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/28/2014] [Accepted: 03/20/2014] [Indexed: 12/19/2022] Open
Abstract
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.
Collapse
Affiliation(s)
- Ségolène Caboche
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| | | | - David Hot
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| |
Collapse
|
29
|
Borozan I, Watt SN, Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq. PLoS One 2013; 8:e76935. [PMID: 24204709 PMCID: PMC3813700 DOI: 10.1371/journal.pone.0076935] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/04/2013] [Indexed: 01/02/2023] Open
Abstract
Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences--including those that exhibit RNA splicing--in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data.
Collapse
Affiliation(s)
- Ivan Borozan
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- * E-mail:
| | - Stuart N. Watt
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Vincent Ferretti
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
30
|
Sensitive detection of viral transcripts in human tumor transcriptomes. PLoS Comput Biol 2013; 9:e1003228. [PMID: 24098097 PMCID: PMC3789765 DOI: 10.1371/journal.pcbi.1003228] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 06/04/2013] [Indexed: 02/07/2023] Open
Abstract
In excess of % of human cancer incidents have a viral cofactor. Epidemiological studies of idiopathic human cancers indicate that additional tumor viruses remain to be discovered. Recent advances in sequencing technology have enabled systematic screenings of human tumor transcriptomes for viral transcripts. However, technical problems such as low abundances of viral transcripts in large volumes of sequencing data, viral sequence divergence, and homology between viral and human factors significantly confound identification of tumor viruses. We have developed a novel computational approach for detecting viral transcripts in human cancers that takes the aforementioned confounding factors into account and is applicable to a wide variety of viruses and tumors. We apply the approach to conducting the first systematic search for viruses in neuroblastoma, the most common cancer in infancy. The diverse clinical progression of this disease as well as related epidemiological and virological findings are highly suggestive of a pathogenic cofactor. However, a viral etiology of neuroblastoma is currently contested. We mapped transcriptomes of neuroblastoma as well as positive and negative controls to the human and all known viral genomes in order to detect both known and unknown viruses. Analysis of controls, comparisons with related methods, and statistical estimates demonstrate the high sensitivity of our approach. Detailed investigation of putative viral transcripts within neuroblastoma samples did not provide evidence for the existence of any known human viruses. Likewise, de-novo assembly and analysis of chimeric transcripts did not result in expression signatures associated with novel human pathogens. While confounding factors such as sample dilution or viral clearance in progressed tumors may mask viral cofactors in the data, in principle, this is rendered less likely by the high sensitivity of our approach and the number of biological replicates analyzed. Therefore, our results suggest that frequent viral cofactors of metastatic neuroblastoma are unlikely. Many human cancers are caused by infections with tumor viruses and identification of these pathogens is considered a critical contribution to cancer prevention. Deep sequencing enables us to systematically investigate viral nucleotide signatures in order to either verify or exclude the existence of viruses in idiopathic human cancers. We have developed Virana, a novel computational approach for identifying tumor viruses in human cancers that is applicable to a wide variety of tumors and viruses. Virana firstly addresses several important biological confounding factors that may hinder successful detection of these pathogens. We applied our approach in the first systematic search for cancer-causing viruses in metastatic neuroblastoma, the most common form of cancer in infancy. Although the heterogeneous clinical progression of this disease as well as epidemiological and virological findings are suggestive of a pathogenic cofactor, the viral etiology of neuroblastoma is currently contested. We conducted an analysis of experimental controls, comparisons with related approaches, as well as statistical analyses in order to validate our method. In spite of the high sensitivity of our approach, analyses of neuroblastoma transcriptomes did not provide evidence for the existence of any known or unknown human viruses. Our results therefore suggest that frequent viral cofactors of metastatic neuroblastoma are unlikely.
Collapse
|
31
|
Abstract
Pathogen discovery is critically important to infectious diseases and public health. Nearly all new outbreaks are caused by the emergence of novel viruses. Genomic tools for pathogen discovery include consensus PCR, microarrays, and deep sequencing. Downstream studies are often necessary to link a candidate novel virus to a disease.
Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease.
Collapse
|
32
|
Naeem R, Rashid M, Pain A. READSCAN: a fast and scalable pathogen discovery program with accurate genome relative abundance estimation. ACTA ACUST UNITED AC 2012. [PMID: 23193222 PMCID: PMC3562070 DOI: 10.1093/bioinformatics/bts684] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Summary: READSCAN is a highly scalable parallel program to identify non-host
sequences (of potential pathogen origin) and estimate their genome relative abundance in
high-throughput sequence datasets. READSCAN accurately classified human and viral
sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf
compute cluster with 16 nodes (Supplementary Material). Availability:http://cbrc.kaust.edu.sa/readscan Contact:arnab.pain@kaust.edu.sa or raeece.naeem@gmail.com Supplementary information:Supplementary data are available at Bioinformatics
online.
Collapse
Affiliation(s)
- Raeece Naeem
- Pathogen Genomics Laboratory, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal-23955-6900, Kingdom of Saudi Arabia.
| | | | | |
Collapse
|