1
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
2
|
Roland CL, Nassif Haddad EF, Keung EZ, Wang WL, Lazar AJ, Lin H, Chelvanambi M, Parra ER, Wani K, Guadagnolo BA, Bishop AJ, Burton EM, Hunt KK, Torres KE, Feig BW, Scally CP, Lewis VO, Bird JE, Ratan R, Araujo D, Zarzour MA, Patel S, Benjamin R, Conley AP, Livingston JA, Ravi V, Tawbi HA, Lin PP, Moon BS, Satcher RL, Mujtaba B, Witt RG, Traweek RS, Cope B, Lazcano R, Wu CC, Zhou X, Mohammad MM, Chu RA, Zhang J, Damania A, Sahasrabhojane P, Tate T, Callahan K, Nguyen S, Ingram D, Morey R, Crosby S, Mathew G, Duncan S, Lima CF, Blay JY, Fridman WH, Shaw K, Wistuba I, Futreal A, Ajami N, Wargo JA, Somaiah N. A randomized, non-comparative phase 2 study of neoadjuvant immune-checkpoint blockade in retroperitoneal dedifferentiated liposarcoma and extremity/truncal undifferentiated pleomorphic sarcoma. NATURE CANCER 2024; 5:625-641. [PMID: 38351182 DOI: 10.1038/s43018-024-00726-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 01/10/2024] [Indexed: 04/30/2024]
Abstract
Based on the demonstrated clinical activity of immune-checkpoint blockade (ICB) in advanced dedifferentiated liposarcoma (DDLPS) and undifferentiated pleomorphic sarcoma (UPS), we conducted a randomized, non-comparative phase 2 trial ( NCT03307616 ) of neoadjuvant nivolumab or nivolumab/ipilimumab in patients with resectable retroperitoneal DDLPS (n = 17) and extremity/truncal UPS (+ concurrent nivolumab/radiation therapy; n = 10). The primary end point of pathologic response (percent hyalinization) was a median of 8.8% in DDLPS and 89% in UPS. Secondary end points were the changes in immune infiltrate, radiographic response, 12- and 24-month relapse-free survival and overall survival. Lower densities of regulatory T cells before treatment were associated with a major pathologic response (hyalinization > 30%). Tumor infiltration by B cells was increased following neoadjuvant treatment and was associated with overall survival in DDLPS. B cell infiltration was associated with higher densities of regulatory T cells before treatment, which was lost upon ICB treatment. Our data demonstrate that neoadjuvant ICB is associated with complex immune changes within the tumor microenvironment in DDLPS and UPS and that neoadjuvant ICB with concurrent radiotherapy has significant efficacy in UPS.
Collapse
Affiliation(s)
- Christina L Roland
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Elise F Nassif Haddad
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Centre Léon-Bérard, University Claude Bernard Lyon I, Lyon, France
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Emily Z Keung
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei-Lien Wang
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Alexander J Lazar
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Heather Lin
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Manoj Chelvanambi
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Edwin R Parra
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Khalida Wani
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - B Ashleigh Guadagnolo
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrew J Bishop
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Elizabeth M Burton
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kelly K Hunt
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Keila E Torres
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Barry W Feig
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Christopher P Scally
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Valerae O Lewis
- Department of Orthopedic Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Justin E Bird
- Department of Orthopedic Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ravin Ratan
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Dejka Araujo
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - M Alexandra Zarzour
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shreyaskumar Patel
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Robert Benjamin
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Anthony P Conley
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - J Andrew Livingston
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vinod Ravi
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hussein A Tawbi
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Patrick P Lin
- Department of Orthopedic Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Bryan S Moon
- Department of Orthopedic Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Robert L Satcher
- Department of Orthopedic Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Bilal Mujtaba
- Department of Musculoskeletal Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Russell G Witt
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Raymond S Traweek
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Brandon Cope
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rossana Lazcano
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Chia-Chin Wu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xiao Zhou
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Mohammad M Mohammad
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Randy A Chu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jianhua Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ashish Damania
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Pranoti Sahasrabhojane
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Taylor Tate
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kate Callahan
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sa Nguyen
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Davis Ingram
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rohini Morey
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shadarra Crosby
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Grace Mathew
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sheila Duncan
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Cibelle F Lima
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jean-Yves Blay
- Centre Léon-Bérard, University Claude Bernard Lyon I, Lyon, France
| | - Wolf Herman Fridman
- Centre de Recherche des Cordeliers, Inserm, Université Paris-Cité, Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Kenna Shaw
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ignacio Wistuba
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrew Futreal
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Nadim Ajami
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jennifer A Wargo
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Neeta Somaiah
- Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
3
|
Zhao Y, Huang F, Wang W, Gao R, Fan L, Wang A, Gao SH. Application of high-throughput sequencing technologies and analytical tools for pathogen detection in urban water systems: Progress and future perspectives. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165867. [PMID: 37516185 DOI: 10.1016/j.scitotenv.2023.165867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 07/31/2023]
Abstract
The ubiquitous presence of pathogenic microorganisms, such as viruses, bacteria, fungi, and protozoa, in urban water systems poses a significant risk to public health. The emergence of infectious waterborne diseases mediated by urban water systems has become one of the leading global causes of mortality. However, the detection and monitoring of these pathogenic microorganisms have been limited by the complexity and diversity in the environmental samples. Conventional methods were restricted by long assay time, high benchmarks of identification, and narrow application sceneries. Novel technologies, such as high-throughput sequencing technologies, enable potentially full-spectrum detection of trace pathogenic microorganisms in complex environmental matrices. This review discusses the current state of high-throughput sequencing technologies for identifying pathogenic microorganisms in urban water systems with a concise summary. Furthermore, future perspectives in pathogen research emphasize the need for detection methods with high accuracy and sensitivity, the establishment of precise detection standards and procedures, and the significance of bioinformatics software and platforms. We have compiled a list of pathogens analysis software/platforms/databases that boast robust engines and high accuracy for preference. We highlight the significance of analyses by combining targeted and non-targeted sequencing technologies, short and long reads technologies, sequencing technologies, and bioinformatic tools in pursuing upgraded biosafety in urban water systems.
Collapse
Affiliation(s)
- Yanmei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Fang Huang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Rui Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
| | - Aijie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Shu-Hong Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
| |
Collapse
|
4
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| |
Collapse
|
5
|
Privitera GF, Alaimo S, Ferro A, Pulvirenti A. Virus finding tools: current solutions and limitations. Brief Bioinform 2022; 23:6618234. [PMID: 35753694 DOI: 10.1093/bib/bbac235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/02/2022] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The study of the Human Virome remains challenging nowadays. Viral metagenomics, through high-throughput sequencing data, is the best choice for virus discovery. The metagenomics approach is culture-independent and sequence-independent, helping search for either known or novel viruses. Though it is estimated that more than 40% of the viruses found in metagenomics analysis are not recognizable, we decided to analyze several tools to identify and discover viruses in RNA-seq samples. RESULTS We have analyzed eight Virus Tools for the identification of viruses in RNA-seq data. These tools were compared using a synthetic dataset of 30 viruses and a real one. Our analysis shows that no tool succeeds in recognizing all the viruses in the datasets. So we can conclude that each of these tools has pros and cons, and their choice depends on the application domain. AVAILABILITY Synthetic data used through the review and raw results of their analysis can be found at https://zenodo.org/record/6426147. FASTQ files of real data can be found in GEO (https://www.ncbi.nlm.nih.gov/gds) or ENA (https://www.ebi.ac.uk/ena/browser/home). Raw results of their analysis can be downloaded from https://zenodo.org/record/6425917.
Collapse
Affiliation(s)
- Grete Francesca Privitera
- Department of Physics and Astronomy, University of Catania, Viale A. Doria, 6, 95125, Catania, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| |
Collapse
|
6
|
Wang W, Chen Y, Wu L, Zhang Y, Yoo S, Chen Q, Liu S, Hou Y, Chen XP, Chen Q, Zhu J. HBV genome-enriched single cell sequencing revealed heterogeneity in HBV-driven hepatocellular carcinoma (HCC). BMC Med Genomics 2022; 15:134. [PMID: 35710421 PMCID: PMC9205089 DOI: 10.1186/s12920-022-01264-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 05/05/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Hepatitis B virus (HBV) related hepatocellular carcinoma (HCC) is heterogeneous and frequently contains multifocal tumors, but how the multifocal tumors relate to each other in terms of HBV integration and other genomic patterns is not clear. METHODS To interrogate heterogeneity of HBV-HCC, we developed a HBV genome enriched single cell sequencing (HGE-scSeq) procedure and a computational method to identify HBV integration sites and infer DNA copy number variations (CNVs). RESULTS We performed HGE-scSeq on 269 cells from four tumor sites and two tumor thrombi of a HBV-HCC patient. HBV integrations were identified in 142 out of 269 (53%) cells sequenced, and were enriched in two HBV integration hotspots chr1:34,397,059 (CSMD2) and chr8:118,557,327 (MED30/EXT1). There were also 162 rare integration sites. HBV integration sites were enriched in DNA fragile sites and sequences around HBV integration sites were enriched for microhomologous sequences between human and HBV genomes. CNVs were inferred for each individual cell and cells were grouped into four clonal groups based on their CNVs. Cells in different clonal groups had different degrees of HBV integration heterogeneity. All of 269 cells carried chromosome 1q amplification, a recurrent feature of HCC tumors, suggesting that 1q amplification occurred before HBV integration events in this case study. Further, we performed simulation studies to demonstrate that the sequential events (HBV infecting transformed cells) could result in the observed phenotype with biologically reasonable parameters. CONCLUSION Our HGE-scSeq data reveals high heterogeneity of HCC tumor cells in terms of both HBV integrations and CNVs. There were two HBV integration hotspots across cells, and cells from multiple tumor sites shared some HBV integration and CNV patterns.
Collapse
Affiliation(s)
- Wenhui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave., New York, NY, 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Yan Chen
- The Hepatic Surgery Centre at Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology (HUST), Wuhan, China
| | | | - Yi Zhang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei, China
| | - Seungyeul Yoo
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave., New York, NY, 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Quan Chen
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave., New York, NY, 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | | | | | - Xiao-Ping Chen
- The Hepatic Surgery Centre at Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology (HUST), Wuhan, China
| | - Qian Chen
- The Division of Gastroenterology, Department of Internal Medicine at Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology (HUST), Wuhan, China.
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave., New York, NY, 10029, USA.
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Sema4, Stamford, CT, USA.
- The Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
7
|
Whole-Genome Sequencing Reveals Age-Specific Changes in the Human Blood Microbiota. J Pers Med 2022; 12:jpm12060939. [PMID: 35743724 PMCID: PMC9225573 DOI: 10.3390/jpm12060939] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 06/03/2022] [Accepted: 06/06/2022] [Indexed: 11/17/2022] Open
Abstract
Based on several reports that indicate the presence of blood microbiota in patients with diseases, we became interested in identifying the presence of bacteria in the blood of healthy individuals. Using 37 samples from 5 families, we extracted sequences that were not mapped to the human reference genome and mapped them to the bacterial reference genome for characterization. Proteobacteria account for more than 95% of the blood microbiota. The results of clustering by means of principal component analysis showed similar patterns for each age group. We observed that the class Gammaproteobacteria was significantly higher in the elderly group (over 60 years old), whereas the arcsine square root-transformed relative abundance of the classes Alphaproteobacteria, Deltaproteobacteria, and Clostridia was significantly lower (p < 0.05). In addition, the diversity among the groups showed a significant difference (p < 0.05) in the elderly group. This result provides meaningful evidence of a consistent phenomenon that chronic diseases associated with aging are accompanied by metabolic endotoxemia and chronic inflammation.
Collapse
|
8
|
Bernasconi A, Cascianelli S. Scenarios for the Integration of Microarray Gene Expression Profiles in COVID-19-Related Studies. Methods Mol Biol 2022; 2401:195-215. [PMID: 34902130 DOI: 10.1007/978-1-0716-1839-4_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The COVID-19 pandemic has hit heavily many aspects of our lives. At this time, genomic research is concerned with exploiting available datasets and knowledge to fuel discovery on this novel disease. Studies that can precisely characterize the gene expression profiles of human hosts infected by SARS-CoV-2 are of significant relevance. However, not many such experiments have yet been produced to date, nor made publicly available online. Thus, it is of paramount importance that data analysts explore all possibilities to integrate information coming from similar viruses and related diseases; interestingly, microarray gene profile experiments become extremely valuable for this purpose. This chapter reviews the aspects that should be considered when integrating transcriptomics data, considering mainly samples infected by different viruses and combining together various data types and also the extracted knowledge. It describes a series of scenarios from studies performed in literature and it suggests possible other directions of noteworthy integration.
Collapse
Affiliation(s)
- Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy.
| | - Silvia Cascianelli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| |
Collapse
|
9
|
Scott MA, Woolums AR, Swiderski CE, Perkins AD, Nanduri B, Smith DR, Karisch BB, Epperson WB, Blanton JR. Comprehensive at-arrival transcriptomic analysis of post-weaned beef cattle uncovers type I interferon and antiviral mechanisms associated with bovine respiratory disease mortality. PLoS One 2021; 16:e0250758. [PMID: 33901263 PMCID: PMC8075194 DOI: 10.1371/journal.pone.0250758] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 04/13/2021] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Despite decades of extensive research, bovine respiratory disease (BRD) remains the most devastating disease in beef cattle production. Establishing a clinical diagnosis often relies upon visual detection of non-specific signs, leading to low diagnostic accuracy. Thus, post-weaned beef cattle are often metaphylactically administered antimicrobials at facility arrival, which poses concerns regarding antimicrobial stewardship and resistance. Additionally, there is a lack of high-quality research that addresses the gene-by-environment interactions that underlie why some cattle that develop BRD die while others survive. Therefore, it is necessary to decipher the underlying host genomic factors associated with BRD mortality versus survival to help determine BRD risk and severity. Using transcriptomic analysis of at-arrival whole blood samples from cattle that died of BRD, as compared to those that developed signs of BRD but lived (n = 3 DEAD, n = 3 ALIVE), we identified differentially expressed genes (DEGs) and associated pathways in cattle that died of BRD. Additionally, we evaluated unmapped reads, which are often overlooked within transcriptomic experiments. RESULTS 69 DEGs (FDR<0.10) were identified between ALIVE and DEAD cohorts. Several DEGs possess immunological and proinflammatory function and associations with TLR4 and IL6. Biological processes, pathways, and disease phenotype associations related to type-I interferon production and antiviral defense were enriched in DEAD cattle at arrival. Unmapped reads aligned primarily to various ungulate assemblies, but failed to align to viral assemblies. CONCLUSION This study further revealed increased proinflammatory immunological mechanisms in cattle that develop BRD. DEGs upregulated in DEAD cattle were predominantly involved in innate immune pathways typically associated with antiviral defense, although no viral genes were identified within unmapped reads. Our findings provide genomic targets for further analysis in cattle at highest risk of BRD, suggesting that mechanisms related to type I interferons and antiviral defense may be indicative of viral respiratory disease at arrival and contribute to eventual BRD mortality.
Collapse
Affiliation(s)
- Matthew A. Scott
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, United States of America
| | - Amelia R. Woolums
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, United States of America
| | - Cyprianna E. Swiderski
- Department of Clinical Sciences, Mississippi State University, Mississippi State, MS, United States of America
| | - Andy D. Perkins
- Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS, United States of America
| | - Bindu Nanduri
- Department of Basic Sciences, Mississippi State University College of Veterinary Medicine, Mississippi State University, Mississippi State, MS, United States of America
| | - David R. Smith
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, United States of America
| | - Brandi B. Karisch
- Department of Animal and Dairy Sciences, Mississippi State University, Mississippi State, MS, United States of America
| | - William B. Epperson
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, United States of America
| | - John R. Blanton
- Department of Animal and Dairy Sciences, Mississippi State University, Mississippi State, MS, United States of America
| |
Collapse
|
10
|
Mwesigwa S, Williams L, Retshabile G, Katagirya E, Mboowa G, Mlotshwa B, Kyobe S, Kateete DP, Wampande EM, Wayengera M, Mpoloka SW, Mirembe AN, Kasvosve I, Morapedi K, Kisitu GP, Kekitiinwa AR, Anabwani G, Joloba ML, Matovu E, Mulindwa J, Noyes H, Botha G, Brown CW, Mardon G, Matshaba M, Hanchard NA. Unmapped exome reads implicate a role for Anelloviridae in childhood HIV-1 long-term non-progression. NPJ Genom Med 2021; 6:24. [PMID: 33741997 PMCID: PMC7979878 DOI: 10.1038/s41525-021-00185-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/25/2021] [Indexed: 01/31/2023] Open
Abstract
Human immunodeficiency virus (HIV) infection remains a significant public health burden globally. The role of viral co-infection in the rate of progression of HIV infection has been suggested but not empirically tested, particularly among children. We extracted and classified 42 viral species from whole-exome sequencing (WES) data of 813 HIV-infected children in Botswana and Uganda categorised as either long-term non-progressors (LTNPs) or rapid progressors (RPs). The Ugandan participants had a higher viral community diversity index compared to Batswana (p = 4.6 × 10-13), and viral sequences were more frequently detected among LTNPs than RPs (24% vs 16%; p = 0.008; OR, 1.9; 95% CI, 1.6-2.3), with Anelloviridae showing strong association with LTNP status (p = 3 × 10-4; q = 0.004, OR, 3.99; 95% CI, 1.74-10.25). This trend was still evident when stratified by country, sex, and sequencing platform, and after a logistic regression analysis adjusting for age, sex, country, and the sequencing platform (p = 0.02; q = 0.03; OR, 7.3; 95% CI, 1.6-40.5). Torque teno virus (TTV), which made up 95% of the Anelloviridae reads, has been associated with reduced immune activation. We identify an association between viral co-infection and prolonged AIDs-free survival status that may have utility as a biomarker of LTNP and could provide mechanistic insights to HIV progression in children, demonstrating the added value of interrogating off-target WES reads in cohort studies.
Collapse
Affiliation(s)
| | | | | | - Eric Katagirya
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - Gerald Mboowa
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Samuel Kyobe
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - David P Kateete
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Misaki Wayengera
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Angella N Mirembe
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | | | | | - Grace P Kisitu
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | - Adeodata R Kekitiinwa
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | - Gabriel Anabwani
- Botswana-Baylor Children's Clinical Centre of Excellence, Gaborone, Botswana
| | - Moses L Joloba
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - Enock Matovu
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Julius Mulindwa
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Harry Noyes
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Gerrit Botha
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Chester W Brown
- University of Tennessee Health Science Center, Le Bonheur Children's Hospital, Memphis, TN, USA
| | - Graeme Mardon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Mogomotsi Matshaba
- Botswana-Baylor Children's Clinical Centre of Excellence, Gaborone, Botswana
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Neil A Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
11
|
Chen X, Li D. Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data. Genomics 2021; 113:1189-1198. [PMID: 33301893 PMCID: PMC7856238 DOI: 10.1016/j.ygeno.2020.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 11/25/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022]
Abstract
Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10-16). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; Department of Computer Science, University of Vermont, Burlington, VT 05405, USA; Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, VT 05405, USA.
| |
Collapse
|
12
|
Buggiotti L, Cheng Z, Wathes DC, GplusE Consortium. Mining the Unmapped Reads in Bovine RNA-Seq Data Reveals the Prevalence of Bovine Herpes Virus-6 in European Dairy Cows and the Associated Changes in Their Phenotype and Leucocyte Transcriptome. Viruses 2020; 12:v12121451. [PMID: 33339352 PMCID: PMC7768445 DOI: 10.3390/v12121451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 12/08/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022] Open
Abstract
Microbial RNA is detectable in host samples by aligning unmapped reads from RNA sequencing against taxon reference sequences, generating a score proportional to the microbial load. An RNA-Seq data analysis showed that 83.5% of leukocyte samples from six dairy herds in different EU countries contained bovine herpes virus-6 (BoHV-6). Phenotypic data on milk production, metabolic function, and disease collected during their first 50 days in milk (DIM) were compared between cows with low (1–200 and n = 114) or high (201–1175 and n = 24) BoHV-6 scores. There were no differences in milk production parameters, but high score cows had numerically fewer incidences of clinical mastitis (4.2% vs. 12.2%) and uterine disease (54.5% vs. 62.7%). Their metabolic status was worse, based on measurements of IGF-1 and various metabolites in blood and milk. A comparison of the global leukocyte transcriptome between high and low BoHV-6 score cows at around 14 DIM yielded 485 differentially expressed genes (DEGs). The top pathway from Gene Ontology (GO) enrichment analysis was the immune system process. Down-regulated genes in the high BoHV-6 cows included those encoding proteins involved in viral detection (DDX6 and DDX58), interferon response, and E3 ubiquitin ligase activity. This suggested that BoHV-6 may largely evade viral detection and that it does not cause clinical disease in dairy cows.
Collapse
|
13
|
Rodriguez RM, Khadka VS, Menor M, Hernandez BY, Deng Y. Tissue-associated microbial detection in cancer using human sequencing data. BMC Bioinformatics 2020; 21:523. [PMID: 33272199 PMCID: PMC7713026 DOI: 10.1186/s12859-020-03831-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/19/2022] Open
Abstract
Cancer is one of the leading causes of morbidity and mortality in the globe. Microbiological infections account for up to 20% of the total global cancer burden. The human microbiota within each organ system is distinct, and their compositional variation and interactions with the human host have been known to attribute detrimental and beneficial effects on tumor progression. With the advent of next generation sequencing (NGS) technologies, data generated from NGS is being used for pathogen detection in cancer. Numerous bioinformatics computational frameworks have been developed to study viral information from host-sequencing data and can be adapted to bacterial studies. This review highlights existing popular computational frameworks that utilize NGS data as input to decipher microbial composition, which output can predict functional compositional differences with clinically relevant applicability in the development of treatment and prevention strategies.
Collapse
Affiliation(s)
- Rebecca M. Rodriguez
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
- NIDDK Central Repository, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, USA
| | - Vedbar S. Khadka
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Mark Menor
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Brenda Y. Hernandez
- Epidemiology, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
| | - Youping Deng
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| |
Collapse
|
14
|
Yuan Z, Ye X, Zhu L, Zhang N, An Z, Zheng WJ. Virome assembly and annotation in brain tissue based on next-generation sequencing. Cancer Med 2020; 9:6776-6790. [PMID: 32738030 PMCID: PMC7520322 DOI: 10.1002/cam4.3325] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 06/20/2020] [Accepted: 07/01/2020] [Indexed: 12/15/2022] Open
Abstract
The glioblastoma multiforme (GBM) is one of the deadliest tumors. It has been speculated that virus plays a role in GBM but the evidences are controversy. Published researches are mainly limited to studies on the presence of human cytomegalovirus (HCMV) in GBM. No comprehensive assessment of the brain virome, the collection of viral material in the brain, based on recently sequenced data has been performed. Here, we characterized the virome from 111 GBM samples and 57 normal brain samples from eight projects in the SRA database by a tested and comprehensive assembly approach. The annotation of the assembled contigs showed that most viral sequences in the brain belong to the viral family Retroviridae. In some GBM samples, we also detected full genome sequence of a novel picornavirus recently discovered in invertebrates. Unlike previous reports, our study did not detect herpes virus such as HCMV in GBM from the data we used. However, some contigs that cannot be annotated with any known genes exhibited antibody epitopes in their sequences. These findings provide several avenues for potential cancer therapy: the newly discovered picornavirus could be a starting point to engineer novel oncolytic virus; and the exhibited antibody epitopes could be a source to explore potential drug targets for immune cancer therapy. By characterizing the virosphere in GBM and normal brain at a global level, the results from this study strengthen the link between GBM and viral infection which warrants the further investigation.
Collapse
Affiliation(s)
- Zihao Yuan
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Xiaohua Ye
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Lisha Zhu
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Ningyan Zhang
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Zhiqiang An
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - W. Jim Zheng
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| |
Collapse
|
15
|
Chen X, Kost J, Li D. Comprehensive comparative analysis of methods and software for identifying viral integrations. Brief Bioinform 2020; 20:2088-2097. [PMID: 30102374 DOI: 10.1093/bib/bby070] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022] Open
Abstract
Many viruses are capable of integrating in the human genome, particularly viruses involved in tumorigenesis. Viral integrations can be considered genetic markers for discovering virus-caused cancers and inferring cancer cell development. Next-generation sequencing (NGS) technologies have been widely used to screen for viral integrations in cancer genomes, and a number of bioinformatics tools have been developed to detect viral integrations using NGS data. However, there has been no systematic comparison of the methods or software. In this study, we performed a comprehensive comparative analysis of the designs, performance, functionality and limitations among the existing methods and software for detecting viral integrations. We further compared the sensitivity, precision and runtime of integration detection of four representative tools. Our analyses showed that each of the existing software had its own merits; however, none of them were sufficient for parallel or accurate virome-wide detection. After carefully evaluating the limitations shared by the existing methods, we proposed strategies and directions for developing virome-wide integration detection.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Cancer Center, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
16
|
Giannuzzi D, Aresu L. A First NGS Investigation Suggests No Association Between Viruses and Canine Cancers. Front Vet Sci 2020; 7:365. [PMID: 32766289 PMCID: PMC7380080 DOI: 10.3389/fvets.2020.00365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 05/26/2020] [Indexed: 12/16/2022] Open
Abstract
Approximately 10–15% of worldwide human cancers are attributable to viral infection. When operating as carcinogenic elements, viruses may act with various mechanisms, but the most important is represented by viral integration into the host genome, causing chromosome instability, genomic mutations, and aberrations. In canine species, few reports have described an association between viral integration and canine cancers, but more comprehensive studies are needed. The advancement of next-generation sequencing and the cost reduction have resulted in a progressive increasing of sequencing data in veterinary oncology offering an opportunity to study virome in canine cancers. In this study, we have performed viral detection and integration analyses using VirusFinder2 software tool on available whole-genome and whole-exome sequencing data of different canine cancers. Several viral sequences were detected in lymphomas, hemangiosarcomas, melanomas, and osteosarcomas, but no reliable integration sites were identified. Even if with some limitations such as the depth and type of sequencing, a restricted number of available nonhuman genomes software, and a limited knowledge on endogenous retroviruses in the canine genome, results are compelling. However, further experiments are needed, and similarly to feline species, dedicated analysis tools for the identification of viral integration sites in canine cancers are required.
Collapse
Affiliation(s)
- Diana Giannuzzi
- Department of Comparative Biomedicine and Food Science, University of Padua, Legnaro, Italy
| | - Luca Aresu
- Department of Veterinary Science, University of Turin, Grugliasco, Italy
| |
Collapse
|
17
|
Johansson P, Klein-Hitpass L, Budeus B, Kuhn M, Lauber C, Seifert M, Roeder I, Pförtner R, Stuschke M, Dührsen U, Eckstein A, Dürig J, Küppers R. Identifying Genetic Lesions in Ocular Adnexal Extranodal Marginal Zone Lymphomas of the MALT Subtype by Whole Genome, Whole Exome and Targeted Sequencing. Cancers (Basel) 2020; 12:cancers12040986. [PMID: 32316399 PMCID: PMC7225979 DOI: 10.3390/cancers12040986] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/06/2020] [Accepted: 04/15/2020] [Indexed: 12/22/2022] Open
Abstract
The pathogenesis of ocular adnexal marginal zone lymphomas of mucosa-associated lymphatic tissue-type (OAML) is not fully understood. We performed whole genome sequencing (WGS) and/or whole exome sequencing (WES) for 13 cases of OAML and sequenced 38 genes selected from this analysis in a large cohort of 82 OAML. Besides confirmation of frequent mutations in the genes transducin beta like 1 X-linked receptor 1 (TBL1XR1) and cAMP response element binding protein (CREBBP), we newly identifed JAK3 as a frequently mutated gene in OAML (11% of cases). In our retrospective cohort, JAK3 mutant cases had a shorter progression-free survival compared with unmutated cases. Other newly identified genes recurrently mutated in 5-10% of cases included members of the collagen family (collagen type XII alpha 1/2 (COL12A1, COL1A2)) and DOCK8. Evaluation of the WGS data of six OAML did not reveal translocations or a current infection of the lymphoma cells by viruses. Evaluation of the WGS data for copy number aberrations confirmed frequent loss of TNFAIP3, and revealed recurrent gains of the NOTCH target HES4, and of members of the CEBP transcription factor family. Overall, we identified several novel genes recurrently affected by point mutations or copy number alterations, but our study also indicated that the landscape of frequently (>10% of cases) mutated protein-coding genes in OAML is now largely known.
Collapse
Affiliation(s)
- Patricia Johansson
- Department of Hematology, University Hospital Essen, University of Duisburg-Essen, 45147 Essen, Germany; (U.D.); (J.D.)
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, 45147 Essen, Germany; (L.K.-H.); (B.B.); (R.K.)
- Correspondence: ; Tel.: +49-201-723-85845
| | - Ludger Klein-Hitpass
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, 45147 Essen, Germany; (L.K.-H.); (B.B.); (R.K.)
| | - Bettina Budeus
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, 45147 Essen, Germany; (L.K.-H.); (B.B.); (R.K.)
| | - Matthias Kuhn
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technical University Dresden, 01307 Dresden, Germany; (M.K.); (C.L.); (M.S.); (I.R.)
| | - Chris Lauber
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technical University Dresden, 01307 Dresden, Germany; (M.K.); (C.L.); (M.S.); (I.R.)
| | - Michael Seifert
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technical University Dresden, 01307 Dresden, Germany; (M.K.); (C.L.); (M.S.); (I.R.)
| | - Ingo Roeder
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technical University Dresden, 01307 Dresden, Germany; (M.K.); (C.L.); (M.S.); (I.R.)
| | - Roman Pförtner
- Department of Oral and Cranio-Maxillofacial Surgery, Kliniken Essen-Mitte, Evang. Huyssens-Stiftung/Knappschaft GmbH, University Hospital of Essen, 45136 Essen, Germany;
| | - Martin Stuschke
- Department of Radiotherapy, University Hospital Essen, 45147 Essen, Germany;
| | - Ulrich Dührsen
- Department of Hematology, University Hospital Essen, University of Duisburg-Essen, 45147 Essen, Germany; (U.D.); (J.D.)
| | - Anja Eckstein
- Department of Ophthalmology, Molecular Ophthalmology Group, University of Duisburg-Essen, 45147 Essen, Germany;
| | - Jan Dürig
- Department of Hematology, University Hospital Essen, University of Duisburg-Essen, 45147 Essen, Germany; (U.D.); (J.D.)
- German Cancer Consortium (DKTK), 45147 Essen, Germany
| | - Ralf Küppers
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, 45147 Essen, Germany; (L.K.-H.); (B.B.); (R.K.)
- German Cancer Consortium (DKTK), 45147 Essen, Germany
| |
Collapse
|
18
|
Abstract
Colorectal cancer (CRC) is a leading cause of cancer-related deaths in both the USA and the world. Recent research has demonstrated the involvement of the gut microbiota in CRC development and progression. Microbial biomarkers of disease have focused primarily on the bacterial component of the microbiome; however, the viral portion of the microbiome, consisting of both bacteriophages and eukaryotic viruses, together known as the virome, has been lesser studied. Here we review the recent advancements in high-throughput sequencing (HTS) technologies and bioinformatics, which have enabled scientists to better understand how viruses might influence the development of colorectal cancer. We discuss the contemporary findings revealing modulations in the virome and their correlation with CRC development and progression. While a variety of challenges still face viral HTS detection in clinical specimens, we consider herein numerous next steps for future basic and clinical research. Clinicians need to move away from a single infectious agent model for disease etiology by grasping new, more encompassing etiological paradigms, in which communities of various microbial components interact with each other and the host. The reporting and indexing of patient health information, socioeconomic data, and other relevant metadata will enable identification of predictive variables and covariates of viral presence and CRC development. Altogether, the virome has a more profound role in carcinogenesis and cancer progression than once thought, and viruses, specific for either human cells or bacteria, are clinically relevant in understanding CRC pathology, patient prognosis, and treatment development.
Collapse
|
19
|
Chen X, Kost J, Sulovari A, Wong N, Liang WS, Cao J, Li D. A virome-wide clonal integration analysis platform for discovering cancer viral etiology. Genome Res 2019; 29:819-830. [PMID: 30872350 PMCID: PMC6499315 DOI: 10.1101/gr.242529.118] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 03/11/2019] [Indexed: 12/31/2022]
Abstract
Oncoviral infection is responsible for 12%–15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Nathalie Wong
- Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong 999077, P.R. China
| | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | - Jian Cao
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA.,Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
20
|
Xia Y, Liu Y, Deng M, Xi R. Detecting virus integration sites based on multiple related sequencing data by VirTect. BMC Med Genomics 2019; 12:19. [PMID: 30704462 PMCID: PMC6357354 DOI: 10.1186/s12920-018-0461-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Since tumor often has a high level of intra-tumor heterogeneity, multiple tumor samples from the same patient at different locations or different time points are often sequenced to study tumor intra-heterogeneity or tumor evolution. In virus-related tumors such as human papillomavirus- and Hepatitis B Virus-related tumors, virus genome integrations can be critical driving events. It is thus important to investigate the integration sites of the virus genomes. Currently, a few algorithms for detecting virus integration sites based on high-throughput sequencing have been developed, but their insufficient performance in their sensitivity, specificity and computational complexity hinders their applications in multiple related tumor sequencing. Results We develop VirTect for detecting virus integration sites simultaneously from multiple related-sample data. This algorithm is mainly based on the joint analysis of short reads spanning breakpoints of integration sites from multiple samples. To achieve high specificity and breakpoint accuracy, a local precise sandwich alignment algorithm is used. Simulation and real data analyses show that, compared with other algorithms, VirTect is significantly more sensitive and has a similar or lower false discovery rate. Conclusions VirTect can provide more accurate breakpoint position and is computationally much more efficient in terms both memory requirement and computational time. Electronic supplementary material The online version of this article (10.1186/s12920-018-0461-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuchao Xia
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Yun Liu
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Ruibin Xi
- School of Mathematical Sciences, Peking University, Beijing, 100871, China. .,Center for Statistical Science, Peking University, Beijing, 100871, China. .,Center for Data Science, Peking University, Beijing, 100871, China.
| |
Collapse
|
21
|
Tetzlaff MT, Curry JL, Ning J, Sagiv O, Kandl TL, Peng B, Bell D, Routbort M, Hudgens CW, Ivan D, Kim TB, Chen K, Eterovic AK, Shaw K, Prieto VG, Yemelyanova A, Esmaeli B. Distinct Biological Types of Ocular Adnexal Sebaceous Carcinoma: HPV-Driven and Virus-Negative Tumors Arise through Nonoverlapping Molecular-Genetic Alterations. Clin Cancer Res 2018; 25:1280-1290. [DOI: 10.1158/1078-0432.ccr-18-1688] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/25/2018] [Accepted: 11/02/2018] [Indexed: 11/16/2022]
|
22
|
Walper SA, Lasarte Aragonés G, Sapsford KE, Brown CW, Rowland CE, Breger JC, Medintz IL. Detecting Biothreat Agents: From Current Diagnostics to Developing Sensor Technologies. ACS Sens 2018; 3:1894-2024. [PMID: 30080029 DOI: 10.1021/acssensors.8b00420] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Although a fundamental understanding of the pathogenicity of most biothreat agents has been elucidated and available treatments have increased substantially over the past decades, they still represent a significant public health threat in this age of (bio)terrorism, indiscriminate warfare, pollution, climate change, unchecked population growth, and globalization. The key step to almost all prevention, protection, prophylaxis, post-exposure treatment, and mitigation of any bioagent is early detection. Here, we review available methods for detecting bioagents including pathogenic bacteria and viruses along with their toxins. An introduction placing this subject in the historical context of previous naturally occurring outbreaks and efforts to weaponize selected agents is first provided along with definitions and relevant considerations. An overview of the detection technologies that find use in this endeavor along with how they provide data or transduce signal within a sensing configuration follows. Current "gold" standards for biothreat detection/diagnostics along with a listing of relevant FDA approved in vitro diagnostic devices is then discussed to provide an overview of the current state of the art. Given the 2014 outbreak of Ebola virus in Western Africa and the recent 2016 spread of Zika virus in the Americas, discussion of what constitutes a public health emergency and how new in vitro diagnostic devices are authorized for emergency use in the U.S. are also included. The majority of the Review is then subdivided around the sensing of bacterial, viral, and toxin biothreats with each including an overview of the major agents in that class, a detailed cross-section of different sensing methods in development based on assay format or analytical technique, and some discussion of related microfluidic lab-on-a-chip/point-of-care devices. Finally, an outlook is given on how this field will develop from the perspective of the biosensing technology itself and the new emerging threats they may face.
Collapse
Affiliation(s)
- Scott A. Walper
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
| | - Guillermo Lasarte Aragonés
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
- College of Science, George Mason University Fairfax, Virginia 22030, United States
| | - Kim E. Sapsford
- OMPT/CDRH/OIR/DMD Bacterial Respiratory and Medical Countermeasures Branch, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Carl W. Brown
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
- College of Science, George Mason University Fairfax, Virginia 22030, United States
| | - Clare E. Rowland
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
- National Research Council, Washington, D.C. 20036, United States
| | - Joyce C. Breger
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
| | - Igor L. Medintz
- Center for Bio/Molecular Science and Engineering, Code 6900, U.S. Naval Research Laboratory, Washington, D.C. 20375, United States
| |
Collapse
|
23
|
Bhuvaneshwar K, Song L, Madhavan S, Gusev Y. viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors. Front Microbiol 2018; 9:1172. [PMID: 29922260 PMCID: PMC5996193 DOI: 10.3389/fmicb.2018.01172] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 05/15/2018] [Indexed: 01/05/2023] Open
Abstract
An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.
Collapse
Affiliation(s)
- Krithika Bhuvaneshwar
- Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC, United States
| | - Lei Song
- Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC, United States
| | - Subha Madhavan
- Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC, United States
| | - Yuriy Gusev
- Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC, United States
| |
Collapse
|
24
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
25
|
Cao J, Li D. Searching for human oncoviruses: Histories, challenges, and opportunities. J Cell Biochem 2018; 119:4897-4906. [PMID: 29377246 DOI: 10.1002/jcb.26717] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 01/24/2018] [Indexed: 01/05/2023]
Abstract
Oncoviruses contribute significantly to cancer burden. A century of tumor virological studies have led to the discovery of seven well-accepted human oncoviruses, cumulatively responsible for approximately 15% of human cancer cases. Virus-caused cancers are largely preventable through vaccination. Identifying additional oncoviruses and virus-caused tumors will advance cancer prevention and precision medicine, benefiting affected individuals, and society as a whole. The historic success of finding human oncoviruses has provided a unique lesson for directing new research efforts in the post-sequencing era. Combing the experiences from these pioneer studies with emerging high-throughput techniques will certainly accelerate new discovery and advance our knowledge of the remaining human oncoviruses.
Collapse
Affiliation(s)
- Jian Cao
- Department of Pathology, Yale University, New Haven, Connecticut
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Department of Computer Science, University of Vermont, Burlington, Vermont.,Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, Vermont.,University of Vermont Cancer Center, University of Vermont, Burlington, Vermont
| |
Collapse
|
26
|
Gannon OM, Antonsson A, Bennett IC, Saunders NA. Viral infections and breast cancer - A current perspective. Cancer Lett 2018; 420:182-189. [PMID: 29410005 DOI: 10.1016/j.canlet.2018.01.076] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/08/2018] [Accepted: 01/31/2018] [Indexed: 01/25/2023]
Abstract
Sporadic human breast cancer is the most common cancer to afflict women. Since the discovery, decades ago, of the oncogenic mouse mammary tumour virus, there has been significant interest in the potential aetiologic role of infectious agents in sporadic human breast cancer. To address this, many studies have examined the presence of viruses (e.g. papillomaviruses, herpes viruses and retroviruses), endogenous retroviruses and more recently, microbes, as a means of implicating them in the aetiology of human breast cancer. Such studies have generated conflicting experimental and clinical reports of the role of infection in breast cancer. This review evaluates the current evidence for a productive oncogenic viral infection in human breast cancer, with a focus on the integration of sensitive and specific next generation sequencing technologies with pathogen discovery. Collectively, the majority of the recent literature using the more powerful next generation sequencing technologies fail to support an oncogenic viral infection being involved in disease causality in breast cancer. In balance, the weight of the current experimental evidence supports the conclusion that viral infection is unlikely to play a significant role in the aetiology of breast cancer.
Collapse
Affiliation(s)
- O M Gannon
- University of Queensland Diamantina Institute, The Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - A Antonsson
- Department of Population Health, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland 4006, Australia; School of Medicine, The University of Queensland, Herston Road, Herston, Queensland 4006, Australia
| | - I C Bennett
- School of Medicine, The University of Queensland, Herston Road, Herston, Queensland 4006, Australia; Private Practice, The Wesley and St Andrews Hospital, Auchenflower 4066, Australia
| | - N A Saunders
- University of Queensland Diamantina Institute, The Faculty of Medicine, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
27
|
Saeb ATM. Current Bioinformatics resources in combating infectious diseases. Bioinformation 2018; 14:31-35. [PMID: 29497257 PMCID: PMC5818640 DOI: 10.6026/97320630014031] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/13/2022] Open
Abstract
Bioinformatics tools and techniques analyzing next-generation sequencing (NGS) data are increasingly used for the diagnosis and monitoring of infectious diseases. It is of interest to review the application of bioinformatics tools, commonly used databases and NGS data in clinical microbiology, focusing on molecular identification, genotypic, microbiome research, antimicrobial resistance analysis and detection of unknown disease-associated pathogens in clinical specimens. This review documents available bioinformatics resources and databases that are used by medical microbiology scientists and physicians to control emerging infectious pathogens.
Collapse
Affiliation(s)
- Amr T. M. Saeb
- Genetics and Biotechnology Department, Strategic Center for Diabetes Research, College of medicine, King Saud University, KSA
| |
Collapse
|
28
|
Jones S, Baizan-Edge A, MacFarlane S, Torrance L. Viral Diagnostics in Plants Using Next Generation Sequencing: Computational Analysis in Practice. FRONTIERS IN PLANT SCIENCE 2017; 8:1770. [PMID: 29123534 PMCID: PMC5662881 DOI: 10.3389/fpls.2017.01770] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 09/28/2017] [Indexed: 05/04/2023]
Abstract
Viruses cause significant yield and quality losses in a wide variety of cultivated crops. Hence, the detection and identification of viruses is a crucial facet of successful crop production and of great significance in terms of world food security. Whilst the adoption of molecular techniques such as RT-PCR has increased the speed and accuracy of viral diagnostics, such techniques only allow the detection of known viruses, i.e., each test is specific to one or a small number of related viruses. Therefore, unknown viruses can be missed and testing can be slow and expensive if molecular tests are unavailable. Methods for simultaneous detection of multiple viruses have been developed, and (NGS) is now a principal focus of this area, as it enables unbiased and hypothesis-free testing of plant samples. The development of NGS protocols capable of detecting multiple known and emergent viruses present in infected material is proving to be a major advance for crops, nuclear stocks or imported plants and germplasm, in which disease symptoms are absent, unspecific or only triggered by multiple viruses. Researchers want to answer the question "how many different viruses are present in this crop plant?" without knowing what they are looking for: RNA-sequencing (RNA-seq) of plant material allows this question to be addressed. As well as needing efficient nucleic acid extraction and enrichment protocols, virus detection using RNA-seq requires fast and robust bioinformatics methods to enable host sequence removal and virus classification. In this review recent studies that use RNA-seq for virus detection in a variety of crop plants are discussed with specific emphasis on the computational methods implemented. The main features of a number of specific bioinformatics workflows developed for virus detection from NGS data are also outlined and possible reasons why these have not yet been widely adopted are discussed. The review concludes by discussing the future directions of this field, including the use of bioinformatics tools for virus detection deployed in analytical environments using cloud computing.
Collapse
Affiliation(s)
- Susan Jones
- Information and Computational Science Group, The James Hutton Institute, Dundee, United Kingdom
| | - Amanda Baizan-Edge
- School of Biology, The University of St Andrews, St Andrews, United Kingdom
| | - Stuart MacFarlane
- Cell and Molecular Science Group, The James Hutton Institute, Dundee, United Kingdom
| | - Lesley Torrance
- School of Biology, The University of St Andrews, St Andrews, United Kingdom
- Cell and Molecular Science Group, The James Hutton Institute, Dundee, United Kingdom
| |
Collapse
|
29
|
Cox JW, Ballweg RA, Taft DH, Velayutham P, Haslam DB, Porollo A. A fast and robust protocol for metataxonomic analysis using RNAseq data. MICROBIOME 2017; 5:7. [PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/05/2016] [Indexed: 05/03/2023]
Abstract
BACKGROUND Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data. RESULTS We present a new protocol, IMSA+A, for accurate taxonomy classification based on metatranscriptome data of any read length that can efficiently and robustly identify bacteria, fungi, and viruses in the same sample. The new protocol improves accuracy by using a conservative reference database, employing a new counting scheme, and by assembling shotgun reads. Assembly also reduces analysis runtime. Simulated data were utilized to evaluate the protocol by permuting common experimental variables. When applied to the real metatranscriptome data for mouse intestines colonized by ASF, the protocol showed superior performance in detection of the microorganisms compared to the existing metataxonomics tools. IMSA+A is available at https://github.com/JeremyCoxBMI/IMSA-A . CONCLUSIONS The developed protocol addresses the need for taxonomy classification from RNAseq data. Previously not utilized, i.e., unmapped to a reference genome, RNAseq reads can now be used to gather taxonomic information about the microbiota present in a biological sample without conducting additional sequencing. Any metatranscriptome pipeline that includes assembly of reads can add this analysis with minimal additional cost of compute time. The new protocol also creates an opportunity to revisit old metatranscriptome data, where taxonomic content may be important but was not analyzed.
Collapse
Affiliation(s)
- Jeremy W Cox
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Richard A Ballweg
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Diana H Taft
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Prakash Velayutham
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - David B Haslam
- Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - Aleksey Porollo
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA.
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
30
|
Usman T, Hadlich F, Demasius W, Weikard R, Kühn C. Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host. Genomics 2017; 109:36-42. [DOI: 10.1016/j.ygeno.2016.11.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 11/21/2016] [Accepted: 11/28/2016] [Indexed: 11/15/2022]
|
31
|
Bullman S, Meyerson M, Kostic AD. Emerging Concepts and Technologies for the Discovery of Microorganisms Involved in Human Disease. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2016; 12:217-244. [PMID: 27959634 DOI: 10.1146/annurev-pathol-012615-044305] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Established infectious agents continue to be a major cause of human morbidity and mortality worldwide. However, the causative agent remains unknown for a wide range of diseases; many of these are suspected to be attributable to yet undiscovered microorganisms. The advent of unbiased high-throughput sequencing and bioinformatics has enabled rapid identification and molecular characterization of known and novel infectious agents in human disease. An exciting era of microbe discovery, now under way, holds great promise for the improvement of global health via the development of antimicrobial therapies, vaccination strategies, targeted public health measures, and probiotic-based preventions and therapies. Here, we review the history of pathogen discovery, discuss improvements and clinical applications for the detection of microbially associated diseases, and explore the challenges and strategies for establishing causation in human disease.
Collapse
Affiliation(s)
- Susan Bullman
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Matthew Meyerson
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142.,Harvard Medical School, Boston, Massachusetts 02115
| | - Aleksandar D Kostic
- Research Division, Joslin Diabetes Center, Boston, Massachusetts 02215; .,Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
32
|
Divergent viral presentation among human tumors and adjacent normal tissues. Sci Rep 2016; 6:28294. [PMID: 27339696 PMCID: PMC4919655 DOI: 10.1038/srep28294] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 05/26/2016] [Indexed: 12/13/2022] Open
Abstract
We applied a newly developed bioinformatics system called VirusScan to investigate the viral basis of 6,813 human tumors and 559 adjacent normal samples across 23 cancer types and identified 505 virus positive samples with distinctive, organ system- and cancer type-specific distributions. We found that herpes viruses (e.g., subtypes HHV4, HHV5, and HHV6) that are highly prevalent across cancers of the digestive tract showed significantly higher abundances in tumor versus adjacent normal samples, supporting their association with these cancers. We also found three HPV16-positive samples in brain lower grade glioma (LGG). Further, recurrent HBV integration at the KMT2B locus is present in three liver tumors, but absent in their matched adjacent normal samples, indicating that viral integration induced host driver genetic alterations are required on top of viral oncogene expression for initiation and progression of liver hepatocellular carcinoma. Notably, viral integrations were found in many genes, including novel recurrent HPV integrations at PTPN13 in cervical cancer. Finally, we observed a set of HHV4 and HBV variants strongly associated with ethnic groups, likely due to viral sequence evolution under environmental influences. These findings provide important new insights into viral roles of tumor initiation and progression and potential new therapeutic targets.
Collapse
|
33
|
Li Y, Wang H, Nie K, Zhang C, Zhang Y, Wang J, Niu P, Ma X. VIP: an integrated pipeline for metagenomics of virus identification and discovery. Sci Rep 2016; 6:23774. [PMID: 27026381 PMCID: PMC4824449 DOI: 10.1038/srep23774] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2015] [Accepted: 03/15/2016] [Indexed: 12/19/2022] Open
Abstract
Identification and discovery of viruses using next-generation sequencing technology is a fast-developing area with potential wide application in clinical diagnostics, public health monitoring and novel virus discovery. However, tremendous sequence data from NGS study has posed great challenge both in accuracy and velocity for application of NGS study. Here we describe VIP (“Virus Identification Pipeline”), a one-touch computational pipeline for virus identification and discovery from metagenomic NGS data. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight. We validated the feasibility and veracity of this pipeline with sequencing results of various types of clinical samples and public datasets. VIP has also contributed to timely virus diagnosis (~10 min) in acutely ill patients, demonstrating its potential in the performance of unbiased NGS-based clinical studies with demand of short turnaround time. VIP is released under GPLv3 and is available for free download at: https://github.com/keylabivdc/VIP.
Collapse
Affiliation(s)
- Yang Li
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China.,National Engineering Research Center for Beijing Biochip Technology, Beijing 102206, China
| | - Hao Wang
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China.,Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, 41345, Sweden
| | - Kai Nie
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Chen Zhang
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Yi Zhang
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Ji Wang
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Peihua Niu
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Xuejun Ma
- Key Laboratory of Medical Virology, Ministry of Health; National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| |
Collapse
|
34
|
Next-generation sequencing of elite berry germplasm and data analysis using a bioinformatics pipeline for virus detection and discovery. Methods Mol Biol 2016; 1302:301-13. [PMID: 25981263 DOI: 10.1007/978-1-4939-2620-6_22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus, and Vaccinium) are known hosts for more than 70 viruses and new ones are identified continually. In modern berry cultivars, viruses tend to be asymptomatic in single infections and symptoms only develop after plants accumulate multiple viruses. Most certification programs are based on visual observations. Infected, asymptomatic material may be propagated in the nursery system and shipped to farms where plants acquire additional viruses and develop symptoms. This practice may result in disease epidemics with great impact to producers and the natural ecosystem alike. In this chapter we present work that allows for the detection of known and discovery of new viruses in elite germplasm, having the potential to greatly reduce virus dispersal associated with movement of propagation material.
Collapse
|
35
|
Friis-Nielsen J, Kjartansdóttir KR, Mollerup S, Asplund M, Mourier T, Jensen RH, Hansen TA, Rey-Iglesia A, Richter SR, Nielsen IB, Alquezar-Planas DE, Olsen PVS, Vinner L, Fridholm H, Nielsen LP, Willerslev E, Sicheritz-Pontén T, Lund O, Hansen AJ, Izarzugaza JMG, Brunak S. Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers. Viruses 2016; 8:E53. [PMID: 26907326 PMCID: PMC4776208 DOI: 10.3390/v8020053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 01/29/2016] [Accepted: 02/05/2016] [Indexed: 12/17/2022] Open
Abstract
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.
Collapse
Affiliation(s)
- Jens Friis-Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Kristín Rós Kjartansdóttir
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Sarah Mollerup
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Maria Asplund
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Tobias Mourier
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Randi Holm Jensen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Arn Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Alba Rey-Iglesia
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Stine Raith Richter
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Ida Broman Nielsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - David E Alquezar-Planas
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Pernille V S Olsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lasse Vinner
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Helena Fridholm
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lars Peter Nielsen
- Department of Autoimmunology and Biomarkers, Statens Serum Institut, DK-2300 Copenhagen S, Denmark.
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Sicheritz-Pontén
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Anders Johannes Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Jose M G Izarzugaza
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Søren Brunak
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
- NNF Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark.
| |
Collapse
|
36
|
Mulcahy-O'Grady H, Workentine ML. The Challenge and Potential of Metagenomics in the Clinic. Front Immunol 2016; 7:29. [PMID: 26870044 PMCID: PMC4737888 DOI: 10.3389/fimmu.2016.00029] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/19/2016] [Indexed: 12/27/2022] Open
Abstract
The bacteria, fungi, and viruses that live on and in us have a tremendous impact on our day-to-day health and are often linked to many diseases, including autoimmune disorders and infections. Diagnosing and treating these disorders relies on accurate identification and characterization of the microbial community. Current sequencing technologies allow the sequencing of the entire nucleic acid complement of a sample providing an accurate snapshot of the community members present in addition to the full genetic potential of that microbial community. There are a number of clinical applications that stand to benefit from these data sets, such as the rapid identification of pathogens present in a sample. Other applications include the identification of antibiotic-resistance genes, diagnosis and treatment of gastrointestinal disorders, and many other diseases associated with bacterial, viral, and fungal microbiomes. Metagenomics also allows the physician to probe more complex phenotypes such as microbial dysbiosis with intestinal disorders and disruptions of the skin microbiome that may be associated with skin disorders. Many of these disorders are not associated with a single pathogen but emerge as a result of complex ecological interactions within microbiota. Currently, we understand very little about these complex phenotypes, yet clearly they are important and in some cases, as with fecal microbiota transplants in Clostridium difficile infections, treating the microbiome of the patient is effective. Here, we give an overview of metagenomics and discuss a number of areas where metagenomics is applicable in the clinic, and progress being made in these areas. This includes (1) the identification of unknown pathogens, and those pathogens particularly hard to culture, (2) utilizing functional information and gene content to understand complex infections such as Clostridium difficile, and (3) predicting antimicrobial resistance of the community using genetic determinants of resistance identified from the sequencing data. All of these applications rely on sophisticated computational tools, and we also discuss the importance of skilled bioinformatic support for the implementation and use of metagenomics in the clinic.
Collapse
Affiliation(s)
- Heidi Mulcahy-O'Grady
- Infection Prevention and Control, Alberta Health Services, and Faculty of Medicine , Calgary, AB , Canada
| | | |
Collapse
|
37
|
Whitacre LK, Tizioto PC, Kim J, Sonstegard TS, Schroeder SG, Alexander LJ, Medrano JF, Schnabel RD, Taylor JF, Decker JE. What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual. BMC Genomics 2015; 16:1114. [PMID: 26714747 PMCID: PMC4696311 DOI: 10.1186/s12864-015-2313-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 12/15/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. RESULTS We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. CONCLUSIONS We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms.
Collapse
Affiliation(s)
- Lynsey K Whitacre
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA. .,Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| | - Polyana C Tizioto
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA. .,Embrapa Southeast Livestock, São Carlos, São Paulo, 13560-970, Brazil.
| | - JaeWoo Kim
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| | - Tad S Sonstegard
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, 20705, USA. .,Recombinetics Inc., 1246 University Ave W #301, St Paul, MN, 55104, USA.
| | - Steven G Schroeder
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, 20705, USA.
| | | | - Juan F Medrano
- Department of Animal Science, University of California-Davis, Davis, CA, 95616, USA.
| | - Robert D Schnabel
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA. .,Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| | - Jared E Decker
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA. .,Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
38
|
No association between HPV positive breast cancer and expression of human papilloma viral transcripts. Sci Rep 2015; 5:18081. [PMID: 26658849 PMCID: PMC4677295 DOI: 10.1038/srep18081] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 11/11/2015] [Indexed: 12/15/2022] Open
Abstract
Infectious agents are thought to be responsible for approximately 16% of cancers worldwide, however there are mixed reports in the literature as to the prevalence and potential pathogenicity of viruses in breast cancer. Furthermore, most studies to date have focused primarily on viral DNA rather than the expression of viral transcripts. We screened a large cohort of fresh frozen breast cancer and normal breast tissue specimens collected from patients in Australia for the presence of human papilloma virus (HPV) DNA, with an overall prevalence of HPV of 16% and 10% in malignant and non-malignant tissue respectively. Samples that were positive for HPV DNA by nested PCR were screened by RNA-sequencing for the presence of transcripts of viral origin, using three different bioinformatic pipelines. We did not find any evidence for HPV or other viral transcripts in HPV DNA positive samples. In addition, we also screened publicly available breast RNA-seq data sets for the presence of viral transcripts and did not find any evidence for the expression of viral transcripts (HPV or otherwise) in other data sets. This data suggests that transcription of viral genomes is unlikely to be a significant factor in breast cancer pathogenesis.
Collapse
|
39
|
Lefterova MI, Suarez CJ, Banaei N, Pinsky BA. Next-Generation Sequencing for Infectious Disease Diagnosis and Management: A Report of the Association for Molecular Pathology. J Mol Diagn 2015; 17:623-34. [PMID: 26433313 DOI: 10.1016/j.jmoldx.2015.07.004] [Citation(s) in RCA: 125] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 05/27/2015] [Accepted: 07/02/2015] [Indexed: 12/31/2022] Open
Abstract
Next-generation sequencing (NGS) technologies are increasingly being used for diagnosis and monitoring of infectious diseases. Herein, we review the application of NGS in clinical microbiology, focusing on genotypic resistance testing, direct detection of unknown disease-associated pathogens in clinical specimens, investigation of microbial population diversity in the human host, and strain typing. We have organized the review into three main sections: i) applications in clinical virology, ii) applications in clinical bacteriology, mycobacteriology, and mycology, and iii) validation, quality control, and maintenance of proficiency. Although NGS holds enormous promise for clinical infectious disease testing, many challenges remain, including automation, standardizing technical protocols and bioinformatics pipelines, improving reference databases, establishing proficiency testing and quality control measures, and reducing cost and turnaround time, all of which would be necessary for widespread adoption of NGS in clinical microbiology laboratories.
Collapse
Affiliation(s)
- Martina I Lefterova
- Association for Molecular Pathology Next-Generation Sequencing in Infectious Disease Work Group, Bethesda, Maryland; Department of Pathology, Stanford University School of Medicine, Stanford, California
| | - Carlos J Suarez
- Association for Molecular Pathology Next-Generation Sequencing in Infectious Disease Work Group, Bethesda, Maryland; Department of Pathology, Stanford University School of Medicine, Stanford, California
| | - Niaz Banaei
- Association for Molecular Pathology Next-Generation Sequencing in Infectious Disease Work Group, Bethesda, Maryland; Department of Pathology, Stanford University School of Medicine, Stanford, California; Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, California
| | - Benjamin A Pinsky
- Association for Molecular Pathology Next-Generation Sequencing in Infectious Disease Work Group, Bethesda, Maryland; Department of Pathology, Stanford University School of Medicine, Stanford, California; Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, California.
| |
Collapse
|
40
|
Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data. Sci Rep 2015; 5:11534. [PMID: 26166306 PMCID: PMC4499804 DOI: 10.1038/srep11534] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 05/07/2015] [Indexed: 11/10/2022] Open
Abstract
Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.
Collapse
|
41
|
Parfenov M, Seidman JG. Finding Pathogenic Nucleic Acid Sequences in Next Generation Sequencing Data. ACTA ACUST UNITED AC 2015; 86:18.9.1-18.9.10. [PMID: 26132004 DOI: 10.1002/0471142905.hg1809s86] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Viruses and bacteria are established as one of the main causes of human diseases from hepatitis to cancer. Recently, the presence of such pathogens has been extensively studied using human whole genome and transcriptome sequencing data. However, detecting and studying pathogens via next generation sequencing data is a challenging task in terms of time and computational resources. In this protocol we give instructions for a simple and quick method to find pathogenic DNA or RNA and detect possible integration of the pathogen genome into the host genome.
Collapse
Affiliation(s)
- Michael Parfenov
- Department of Genetics, Harvard Medical School, Boston, Massachusetts
| | - J G Seidman
- Department of Genetics, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
42
|
Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL. Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data. PLoS One 2015; 10:e0129059. [PMID: 26098299 PMCID: PMC4476701 DOI: 10.1371/journal.pone.0129059] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 05/04/2015] [Indexed: 12/18/2022] Open
Abstract
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.
Collapse
Affiliation(s)
- Gordon M. Daly
- Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom
| | - Richard M. Leggett
- The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich, NR47UH, United Kingdom
| | - William Rowe
- Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom
| | - Samuel Stubbs
- Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom
| | - Maxim Wilkinson
- Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom
| | | | - Mario Caccamo
- The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich, NR47UH, United Kingdom
| | - William Bernal
- Institute of Liver Studies, King's College Hospital, Denmark Hill, London, SE59RS, United Kingdom
| | - Jonathan L. Heeney
- Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom
- * E-mail:
| |
Collapse
|
43
|
Possible Human Papillomavirus 38 Contamination of Endometrial Cancer RNA Sequencing Samples in The Cancer Genome Atlas Database. J Virol 2015; 89:8967-73. [PMID: 26085148 DOI: 10.1128/jvi.00822-15] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/09/2015] [Indexed: 12/17/2022] Open
Abstract
UNLABELLED Viruses are causally associated with a number of human malignancies. In this study, we sought to identify new virus-cancer associations by searching RNA sequencing data sets from >2,000 patients, encompassing 21 cancers from The Cancer Genome Atlas (TCGA), for the presence of viral sequences. In agreement with previous studies, we found human papillomavirus 16 (HPV16) and HPV18 in oropharyngeal cancer and hepatitis B and C viruses in liver cancer. Unexpectedly, however, we found HPV38, a cutaneous form of HPV associated with skin cancer, in 32 of 168 samples from endometrial cancer. In 12 of the HPV38-positive (HPV38(+)) samples, we observed at least one paired read that mapped to both human and HPV38 genomes, indicative of viral integration into the host DNA, something not previously demonstrated for HPV38. The expression levels of HPV38 transcripts were relatively low, and all 32 HPV38(+) samples belonged to the same experimental batch of 40 samples, whereas none of the other 128 endometrial carcinoma samples were HPV38(+), raising doubts about the significance of the HPV38 association. Moreover, the HPV38(+) samples contained the same 10 novel single nucleotide variations (SNVs), leading us to hypothesize that one patient was infected with this new isolate of HPV38, which was integrated into his/her genome and may have cross-contaminated other TCGA samples within batch 228. Based on our analysis, we propose guidelines to examine the batch effect, virus expression level, and SNVs as part of next-generation sequencing (NGS) data analysis for evaluating the significance of viral/pathogen sequences in clinical samples. IMPORTANCE High-throughput RNA sequencing (RNA-Seq), followed by computational analysis, has vastly accelerated the identification of viral and other pathogenic sequences in clinical samples, but cross-contamination during the processing of the samples remain a major problem that can lead to erroneous conclusions. We found HPV38 sequences specifically present in RNA-Seq samples from endometrial cancer patients from TCGA, a virus not previously associated with this type of cancer. However, multiple lines of evidence suggest possible cross-contamination in these samples, which were processed together in the same batch. Despite this potential cross-contamination, our data indicate that we have detected a new isolate of HPV38 that appears to be integrated into the human genome. We also provide general guidelines for computational detection and interpretation of pathogen-disease associations.
Collapse
|
44
|
Chandrani P, Kulkarni V, Iyer P, Upadhyay P, Chaubal R, Das P, Mulherkar R, Singh R, Dutt A. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome. Br J Cancer 2015; 112:1958-1965. [PMID: 25973533 PMCID: PMC4580395 DOI: 10.1038/bjc.2015.121] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Revised: 03/03/2015] [Accepted: 03/07/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. METHODS We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. RESULTS On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. CONCLUSIONS HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.
Collapse
Affiliation(s)
- P Chandrani
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - V Kulkarni
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Iyer
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Upadhyay
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Chaubal
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Das
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Mulherkar
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Singh
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - A Dutt
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| |
Collapse
|
45
|
Scheuch M, Höper D, Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinformatics 2015; 16:69. [PMID: 25886935 PMCID: PMC4351923 DOI: 10.1186/s12859-015-0503-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 02/20/2015] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. RESULTS To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. CONCLUSIONS RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Collapse
Affiliation(s)
- Matthias Scheuch
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| |
Collapse
|
46
|
Wang Q, Jia P, Zhao Z. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization. Genome Med 2015; 7:2. [PMID: 25699093 PMCID: PMC4333248 DOI: 10.1186/s13073-015-0126-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/05/2015] [Indexed: 12/28/2022] Open
Abstract
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| |
Collapse
|
47
|
Calistri A, Palu G. Editorial Commentary: Unbiased Next-Generation Sequencing and New Pathogen Discovery: Undeniable Advantages and Still-Existing Drawbacks. Clin Infect Dis 2015; 60:889-91. [DOI: 10.1093/cid/ciu913] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
48
|
Rawat A, Engelthaler DM, Driebe EM, Keim P, Foster JT. MetaGeniE: characterizing human clinical samples using deep metagenomic sequencing. PLoS One 2014; 9:e110915. [PMID: 25365329 PMCID: PMC4218713 DOI: 10.1371/journal.pone.0110915] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 09/19/2014] [Indexed: 11/19/2022] Open
Abstract
With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from high-throughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies “all against all” relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.
Collapse
Affiliation(s)
- Arun Rawat
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
- * E-mail: (AR); (JTF)
| | - David M. Engelthaler
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
| | - Elizabeth M. Driebe
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
| | - Paul Keim
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
- Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Jeffrey T. Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, United States of America
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire, United States of America
- * E-mail: (AR); (JTF)
| |
Collapse
|
49
|
Ho T, Tzanetakis IE. Development of a virus detection and discovery pipeline using next generation sequencing. Virology 2014; 471-473:54-60. [PMID: 25461531 DOI: 10.1016/j.virol.2014.09.019] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Revised: 08/28/2014] [Accepted: 09/22/2014] [Indexed: 12/13/2022]
Abstract
Next generation sequencing (NGS) has revolutionized virus discovery. Notwithstanding, a vertical pipeline, from sample preparation to data analysis, has not been available to the plant virology community. We developed a degenerate oligonucleotide primed RT-PCR method with multiple barcodes for NGS, and constructed VirFind, a bioinformatics tool specifically for virus detection and discovery able to: (i) map and filter out host reads, (ii) deliver files of virus reads with taxonomic information and corresponding Blastn and Blastx reports, and (iii) perform conserved domain search for reads of unknown origin. The pipeline was used to process more than 30 samples resulting in the detection of all viruses known to infect the processed samples, the extension of the genomic sequences of others, and the discovery of several novel viruses. VirFind was tested by four external users with datasets from plants or insects, demonstrating its potential as a universal virus detection and discovery tool.
Collapse
Affiliation(s)
- Thien Ho
- Department of Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville, AR, USA.
| | - Ioannis E Tzanetakis
- Department of Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville, AR, USA.
| |
Collapse
|
50
|
Tae H, Karunasena E, Bavarva JH, McIver LJ, Garner HR. Large scale comparison of non-human sequences in human sequencing data. Genomics 2014; 104:453-8. [PMID: 25173571 DOI: 10.1016/j.ygeno.2014.08.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 08/17/2014] [Accepted: 08/19/2014] [Indexed: 11/19/2022]
Abstract
Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.
Collapse
Affiliation(s)
- Hongseok Tae
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Enusha Karunasena
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Jasmin H Bavarva
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Lauren J McIver
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Harold R Garner
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|