1
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
2
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
3
|
Choong WK, Wang JH, Sung TY. MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides for proteogenomic analysis. J Proteomics 2020; 223:103819. [PMID: 32407886 DOI: 10.1016/j.jprot.2020.103819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 05/04/2020] [Accepted: 05/09/2020] [Indexed: 12/12/2022]
Abstract
Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments. SIGNIFICANCE: We present MinProtMaxVP, a novel approach to generate SAV-harboring protein sequences for constructing a customized protein sequence database, which is used in database searching for variant peptide identification. This approach outperforms the existing approaches in generating all possible variant peptides to be included in protein sequences and possibly leading to identification of more variant peptides in proteogenomic analysis.
Collapse
Affiliation(s)
- Wai-Kok Choong
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Jen-Hung Wang
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan.
| |
Collapse
|
4
|
Li Y, Wang G, Tan X, Ouyang J, Zhang M, Song X, Liu Q, Leng Q, Chen L, Xie L. ProGeo-neo: a customized proteogenomic workflow for neoantigen prediction and selection. BMC Med Genomics 2020; 13:52. [PMID: 32241270 PMCID: PMC7118832 DOI: 10.1186/s12920-020-0683-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Neoantigens can be differentially recognized by T cell receptor (TCR) as these sequences are derived from mutant proteins and are unique to the tumor. The discovery of neoantigens is the first key step for tumor-specific antigen (TSA) based immunotherapy. Based on high-throughput tumor genomic analysis, each missense mutation can potentially give rise to multiple neopeptides, resulting in a vast total number, but only a small percentage of these peptides may achieve immune-dominant status with a given major histocompatibility complex (MHC) class I allele. Specific identification of immunogenic candidate neoantigens is consequently a major challenge. Currently almost all neoantigen prediction tools are based on genomics data. RESULTS Here we report the construction of proteogenomics prediction of neoantigen (ProGeo-neo) pipeline, which incorporates the following modules: mining tumor specific antigens from next-generation sequencing genomic and mRNA expression data, predicting the binding mutant peptides to class I MHC molecules by latest netMHCpan (v.4.0), verifying MHC-peptides by MaxQuant with mass spectrometry proteomics data searched against customized protein database, and checking potential immunogenicity of T-cell-recognization by additional screening methods. ProGeo-neo pipeline achieves proteogenomics strategy and the neopeptides identified were of much higher quality as compared to those identified using genomic data only. CONCLUSIONS The pipeline was constructed based on the genomics and proteomics data of Jurkat leukemia cell line but is generally applicable to other solid cancer research. With massively parallel sequencing and proteomics profiling increasing, this proteogenomics workflow should be useful for neoantigen oriented research and immunotherapy.
Collapse
Affiliation(s)
- Yuyu Li
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.,Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Guangzhi Wang
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.,Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Xiaoxiu Tan
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Jian Ouyang
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Menghuan Zhang
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Qi Liu
- Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 20009, China
| | - Qibin Leng
- Affiliated Cancer Hospital & Institute of Guangzhou Medical University, 78 Heng Zhi Gang, Lu Hu Road, Guangzhou, 510095, China
| | - Lanming Chen
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.
| | - Lu Xie
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China.
| |
Collapse
|
5
|
Ma WT, Liu ZY, Chen XZ, Lin ZL, Zheng ZB, Miao WG, Xie SQ. A protein identification algorithm for tandem mass spectrometry by incorporating the abundance of mRNA into a binomial probability scoring model. J Proteomics 2019; 197:53-59. [PMID: 30790687 DOI: 10.1016/j.jprot.2019.02.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 02/15/2019] [Accepted: 02/17/2019] [Indexed: 12/17/2022]
Abstract
Peptide-spectrum matches (PSM) scoring between the experimental and theoretical spectrum is a key step in the identification of proteins using mass spectrometry (MS)-based proteomics analyses. Efficient protein identification using MS/MS data remains a challenge. The strategy of using RNA-seq data increases the number of proteins identified by re-constructing the custom search database and integrating mRNA abundance into the false discovery rate of post-PSM. However, this process lacks an algorithm that can allow the incorporation of mRNA abundance into the key scoring model of PSM. Therefore, we developed a novel PSM scoring model, which incorporates mRNA abundance for improved peptide and protein identification. In the new algorithm, abundance information of mRNA was transformed to the prior probability of protein identification and integrated to re-score in PSM using the binomial probability distribution model. Compared with other algorithms using five MS/MS datasets, the results showed that the least improvement ratios of peptide and protein groups were 3.39%-9.79% and 0.48%-8.16% in different datasets (human, rat, zebrafish, yeast, and Arabidopsis thaliana). The new strategy offers an effective solution for MS-based identification of peptides and proteins. SIGNIFICANCE: The new algorithm identifies proteins by quantifying mRNA abundance (FPKM) and incorporating it into a scoring model for peptide-spectrum matches. It is important to improve peptide and protein identification from MS/MS datasets in proteomics research.
Collapse
Affiliation(s)
- Wen-Tai Ma
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Zhao-Yu Liu
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer science, Yunnan Minzu University, Kunming 650031, China
| | - Zhen-Liang Lin
- Department of General Surgery, The Affiliated Cangnan Hospital of Wenzhou Medical University, Wenzhou 325800, China
| | - Zhong-Bing Zheng
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Wei-Guo Miao
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| | - Shang-Qian Xie
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| |
Collapse
|
6
|
Dimitrakopoulos L, Prassas I, Sieuwerts AM, Diamandis EP, Martens JWM, Charames GS. Proteome-wide onco-proteogenomic somatic variant identification in ER-positive breast cancer. Clin Biochem 2019; 66:63-75. [PMID: 30684468 DOI: 10.1016/j.clinbiochem.2019.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 01/14/2019] [Accepted: 01/18/2019] [Indexed: 01/19/2023]
Abstract
BACKGROUND Recent advances in mass spectrometric instrumentation and bioinformatics have critically contributed to the field of proteogenomics. Nonetheless, whether that integrative approach has reached the point of maturity to effectively reveal the flow of genetic variants from DNA to proteins still remains elusive. The objective of this study was to detect somatically acquired protein variants in breast cancer specimens for which full genome and transcriptome data was already available (BASIS cohort). METHODS LC-MS/MS shotgun proteomic results of 21 breast cancer tissues were coupled to DNA sequencing data to identify variants at the protein level and finally were used to associate protein expression with gene expression levels. RESULTS Here we report the observation of three sequencing-predicted single amino acid somatic variants. The sensitivity of single amino acid variant (SAAV) detection based on DNA sequencing-predicted single nucleotide variants was 0.4%. This sensitivity was increased to 0.6% when all the predicted variants were filtered for MS "compatibility" and was further increased to 2.9% when only proteins with at least one wild type peptide detected were taken into account. A correlation of mRNA abundance and variant peptide detection revealed that transcripts for which variant proteins were detected ranked among the top 6.3% most abundant transcripts. The variants were detected in highly abundant proteins as well, thus establishing transcript and protein abundance and MS "compatibility" as the main factors affecting variant onco-proteogenomic identification. CONCLUSIONS While proteomics fails to identify the vast majority of exome DNA variants in the resulting proteome, its ability to detect a small subset of SAAVs could prove valuable for precision medicine applications.
Collapse
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON M5G 1X5, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON M5G 1X5, Canada
| | - Anieta M Sieuwerts
- Department of Medical Oncology and Cancer Genomics Netherlands, Erasmus MC Cancer Institute, Erasmus University Medical Center, Wytemaweg 80, 3015 CN Rotterdam, The Netherlands
| | - Eleftherios P Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Clinical Biochemistry, University Health Network, 190 Elizabeth Street, Toronto, ON M5G 2C4, Canada
| | - John W M Martens
- Department of Medical Oncology and Cancer Genomics Netherlands, Erasmus MC Cancer Institute, Erasmus University Medical Center, Wytemaweg 80, 3015 CN Rotterdam, The Netherlands.
| | - George S Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON M5G 1X5, Canada.
| |
Collapse
|
7
|
Groves IJ, Coleman N. Human papillomavirus genome integration in squamous carcinogenesis: what have next-generation sequencing studies taught us? J Pathol 2018; 245:9-18. [PMID: 29443391 DOI: 10.1002/path.5058] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 02/01/2018] [Accepted: 02/06/2018] [Indexed: 12/31/2022]
Abstract
Human papillomavirus (HPV) infection is associated with ∼5% of all human cancers, including a range of squamous cell carcinomas. Persistent infection by high-risk HPVs (HRHPVs) is associated with the integration of virus genomes (which are usually stably maintained as extrachromosomal episomes) into host chromosomes. Although HRHPV integration rates differ across human sites of infection, this process appears to be an important event in HPV-associated neoplastic progression, leading to deregulation of virus oncogene expression, host gene expression modulation, and further genomic instability. However, the mechanisms by which HRHPV integration occur and by which the subsequent gene expression changes take place are incompletely understood. The advent of next-generation sequencing (NGS) of both RNA and DNA has allowed powerful interrogation of the association of HRHPVs with human disease, including precise determination of the sites of integration and the genomic rearrangements at integration loci. In turn, these data have indicated that integration occurs through two main mechanisms: looping integration and direct insertion. Improved understanding of integration sites is allowing further investigation of the factors that provide a competitive advantage to some integrants during disease progression. Furthermore, advanced approaches to the generation of genome-wide samples have given novel insights into the three-dimensional interactions within the nucleus, which could act as another layer of epigenetic control of both virus and host transcription. It is hoped that further advances in NGS techniques and analysis will not only allow the examination of further unanswered questions regarding HPV infection, but also direct new approaches to treating HPV-associated human disease. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Ian J Groves
- Department of Pathology, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
8
|
Kontostathi G, Zoidakis J, Anagnou NP, Pappa KI, Vlahou A, Makridakis M. Proteomics approaches in cervical cancer: focus on the discovery of biomarkers for diagnosis and drug treatment monitoring. Expert Rev Proteomics 2017; 13:731-45. [PMID: 27398979 DOI: 10.1080/14789450.2016.1210514] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
INTRODUCTION The HPV virus accounts for the majority of cervical cancer cases. Although a diagnostic tool (Pap Test) is widely available, cervical cancer incidence still remains high worldwide, and especially in developing countries, attributed to a large extent to suboptimal sensitivities of the Pap test and unavailability of the test in developing countries. AREAS COVERED Proteomics approaches have been used in order to understand the HPV virus correlation to cervical cancer pathology, as well as to discover putative biomarkers for early cervical cancer diagnosis and drug mode of action. Expert commentary: The present review summarizes the latest in vitro and in vivo proteomic studies for the discovery of putative cervical cancer biomarkers and the evaluation of available drugs and treatments.
Collapse
Affiliation(s)
- Georgia Kontostathi
- a Biotechnology Division , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece.,b Laboratory of Biology , University of Athens School of Medicine , Athens , Greece
| | - Jerome Zoidakis
- a Biotechnology Division , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece
| | - Nicholas P Anagnou
- b Laboratory of Biology , University of Athens School of Medicine , Athens , Greece.,c Cell and Gene Therapy Laboratory , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece
| | - Kalliopi I Pappa
- c Cell and Gene Therapy Laboratory , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece.,d First Department of Obstetrics and Gynecology , University of Athens School of Medicine , Athens , Greece
| | - Antonia Vlahou
- a Biotechnology Division , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece
| | - Manousos Makridakis
- a Biotechnology Division , Biomedical Research Foundation, Academy of Athens (BRFAA) , Athens , Greece
| |
Collapse
|
9
|
Dimitrakopoulos L, Prassas I, Diamandis EP, Charames GS. Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction. Crit Rev Clin Lab Sci 2017; 54:414-432. [DOI: 10.1080/10408363.2017.1384446] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
| | - Eleftherios P. Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - George S. Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
10
|
A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection. mSphere 2017; 2:mSphere00307-17. [PMID: 28932815 PMCID: PMC5597969 DOI: 10.1128/msphere.00307-17] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 08/23/2017] [Indexed: 11/20/2022] Open
Abstract
Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials, clinical materials, and biological products. Therefore, HTS can be a powerful tool for supplementing current methods for demonstrating the absence of adventitious or unwanted viruses in biological products, particularly when using a new cell line. However, HTS is a complex technology with different platforms, which needs standardization for evaluation of biologics. This collaborative study was undertaken to investigate detection of different virus types using two different HTS platforms. The results of the independently performed studies demonstrated a similar sensitivity of virus detection, regardless of the different sample preparation and processing procedures and bioinformatic analyses done in the three laboratories. Comparable HTS detection of different virus types supports future development of reference virus materials for standardization and validation of different HTS platforms. The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials, clinical materials, and biological products. Therefore, HTS can be a powerful tool for supplementing current methods for demonstrating the absence of adventitious or unwanted viruses in biological products, particularly when using a new cell line. However, HTS is a complex technology with different platforms, which needs standardization for evaluation of biologics. This collaborative study was undertaken to investigate detection of different virus types using two different HTS platforms. The results of the independently performed studies demonstrated a similar sensitivity of virus detection, regardless of the different sample preparation and processing procedures and bioinformatic analyses done in the three laboratories. Comparable HTS detection of different virus types supports future development of reference virus materials for standardization and validation of different HTS platforms.
Collapse
|
11
|
Universal Human Papillomavirus Typing Assay: Whole-Genome Sequencing following Target Enrichment. J Clin Microbiol 2016; 55:811-823. [PMID: 27974548 PMCID: PMC5328449 DOI: 10.1128/jcm.02132-16] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 12/09/2016] [Indexed: 12/19/2022] Open
Abstract
We designed a universal human papillomavirus (HPV) typing assay based on target enrichment and whole-genome sequencing (eWGS). The RNA bait included 23,941 probes targeting 191 HPV types and 12 probes targeting beta-globin as a control. We used the Agilent SureSelect XT2 protocol for library preparation, Illumina HiSeq 2500 for sequencing, and CLC Genomics Workbench for sequence analysis. Mapping stringency for type assignment was determined based on 8 (6 HPV-positive and 2 HPV-negative) control samples. Using the optimal mapping conditions, types were assigned to 24 blinded samples. eWGS results were 100% concordant with Linear Array (LA) genotyping results for 9 plasmid samples and fully or partially concordant for 9 of the 15 cervical-vaginal samples, with 95.83% overall type-specific concordance for LA genotyping. eWGS identified 7 HPV types not included in the LA genotyping. Since this method does not involve degenerate primers targeting HPV genomic regions, PCR bias in genotype detection is minimized. With further refinements aimed at reducing cost and increasing throughput, this first application of eWGS for universal HPV typing could be a useful method to elucidate HPV epidemiology.
Collapse
|
12
|
Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J. Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics 2016; 13:185-99. [DOI: 10.1586/14789450.2016.1132169] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|