1
|
Tian L, Xie Y, Xie Z, Tian J, Tian W. AtacAnnoR: a reference-based annotation tool for single cell ATAC-seq data. Brief Bioinform 2023; 24:bbad268. [PMID: 37497729 DOI: 10.1093/bib/bbad268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/14/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Here, we present AtacAnnoR, a two-round annotation method for scATAC-seq data using well-annotated scRNA-seq data as reference. We evaluate AtacAnnoR's performance against six competing methods on 11 benchmark datasets. Our results show that AtacAnnoR achieves the highest mean accuracy and the highest mean balanced accuracy and performs particularly well when unpaired scRNA-seq data are used as the reference. Furthermore, AtacAnnoR implements a 'Combine and Discard' strategy to further improve annotation accuracy when annotations of multiple references are available. AtacAnnoR has been implemented in an R package and can be directly integrated into currently popular scATAC-seq analysis pipelines.
Collapse
Affiliation(s)
- Lejin Tian
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Yunxiao Xie
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Zhaobin Xie
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | | | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
- Children's Hospital of Shandong University, Jinan, China
| |
Collapse
|
2
|
Kohler D, Tsai TH, Verschueren E, Huang T, Hinkle T, Phu L, Choi M, Vitek O. MSstatsPTM: Statistical Relative Quantification of Posttranslational Modifications in Bottom-Up Mass Spectrometry-Based Proteomics. Mol Cell Proteomics 2023; 22:100477. [PMID: 36496144 PMCID: PMC9860394 DOI: 10.1016/j.mcpro.2022.100477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/18/2022] [Accepted: 11/29/2022] [Indexed: 12/13/2022] Open
Abstract
Liquid chromatography coupled with bottom-up mass spectrometry (LC-MS/MS)-based proteomics is increasingly used to detect changes in posttranslational modifications (PTMs) in samples from different conditions. Analysis of data from such experiments faces numerous statistical challenges. These include the low abundance of modified proteoforms, the small number of observed peptides that span modification sites, and confounding between changes in the abundance of PTM and the overall changes in the protein abundance. Therefore, statistical approaches for detecting differential PTM abundance must integrate all the available information pertaining to a PTM site and consider all the relevant sources of confounding and variation. In this manuscript, we propose such a statistical framework, which is versatile, accurate, and leads to reproducible results. The framework requires an experimental design, which quantifies, for each sample, both peptides with PTMs and peptides from the same proteins with no modification sites. The proposed framework supports both label-free and tandem mass tag-based LC-MS/MS acquisitions. The statistical methodology separately summarizes the abundances of peptides with and without the modification sites, by fitting separate linear mixed effects models appropriate for the experimental design. Next, model-based inferences regarding the PTM and the protein-level abundances are combined to account for the confounding between these two sources. Evaluations on computer simulations, a spike-in experiment with known ground truth, and three biological experiments with different organisms, modification types, and data acquisition types demonstrate the improved fold change estimation and detection of differential PTM abundance, as compared to currently used approaches. The proposed framework is implemented in the free and open-source R/Bioconductor package MSstatsPTM.
Collapse
Affiliation(s)
- Devon Kohler
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Tsung-Heng Tsai
- Department of Mathematical Sciences, Kent State University, Kent, Ohio, USA
| | - Erik Verschueren
- ULUA BV, Antwerp, Belgium; MPL, Genentech, South San Francisco, California, USA
| | - Ting Huang
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Trent Hinkle
- MPL, Genentech, South San Francisco, California, USA
| | - Lilian Phu
- MPL, Genentech, South San Francisco, California, USA
| | - Meena Choi
- MPL, Genentech, South San Francisco, California, USA.
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA.
| |
Collapse
|
3
|
Zhang J, Eteleeb AM, Rozycki EB, Inkman MJ, Ly A, Scharf RE, Jayachandran K, Krasnick BA, Mazur T, White NM, Fields RC, Maher CA. DANSR: A Tool for the Detection of Annotated and Novel Small RNAs. Noncoding RNA 2022; 8:ncrna8010009. [PMID: 35076605 PMCID: PMC8788476 DOI: 10.3390/ncrna8010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 12/22/2021] [Accepted: 01/10/2022] [Indexed: 11/16/2022] Open
Abstract
Existing small noncoding RNA analysis tools are optimized for processing short sequencing reads (17-35 nucleotides) to monitor microRNA expression. However, these strategies under-represent many biologically relevant classes of small noncoding RNAs in the 36-200 nucleotides length range (tRNAs, snoRNAs, etc.). To address this, we developed DANSR, a tool for the detection of annotated and novel small RNAs using sequencing reads with variable lengths (ranging from 17-200 nt). While DANSR is broadly applicable to any small RNA dataset, we applied it to a cohort of matched normal, primary, and distant metastatic colorectal cancer specimens to demonstrate its ability to quantify annotated small RNAs, discover novel genes, and calculate differential expression. DANSR is available as an open source tool.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA; (J.Z.); (M.J.I.); (K.J.); (T.M.)
- Institute for Informatics (I2), Washington University School of Medicine, St. Louis, MO 63110, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA; (N.M.W.); (R.C.F.)
| | - Abdallah M. Eteleeb
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA;
| | - Emily B. Rozycki
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; (E.B.R.); (A.L.)
| | - Matthew J. Inkman
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA; (J.Z.); (M.J.I.); (K.J.); (T.M.)
| | - Amy Ly
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; (E.B.R.); (A.L.)
| | - Russell E. Scharf
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA;
- Department of Computer Science & Engineering, Washington University, St. Louis, MO 63130, USA
| | - Kay Jayachandran
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA; (J.Z.); (M.J.I.); (K.J.); (T.M.)
| | - Bradley A. Krasnick
- Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110, USA;
| | - Thomas Mazur
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA; (J.Z.); (M.J.I.); (K.J.); (T.M.)
| | - Nicole M. White
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA; (N.M.W.); (R.C.F.)
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; (E.B.R.); (A.L.)
| | - Ryan C. Fields
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA; (N.M.W.); (R.C.F.)
- Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110, USA;
| | - Christopher A. Maher
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA; (N.M.W.); (R.C.F.)
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; (E.B.R.); (A.L.)
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA;
- Department of Biomedical Engineering, Washington University, St. Louis, MO 63105, USA
- Correspondence:
| |
Collapse
|
4
|
Mehta S, Kumar P, Crane M, Johnson JE, Sajulga R, Nguyen DDA, McGowan T, Arntzen MØ, Griffin TJ, Jagtap PD. Updates on metaQuantome Software for Quantitative Metaproteomics. J Proteome Res 2021; 20:2130-2137. [PMID: 33683127 DOI: 10.1021/acs.jproteome.0c00960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
metaQuantome is a software suite that enables the quantitative analysis, statistical evaluation. and visualization of mass-spectrometry-based metaproteomics data. In the latest update of this software, we have provided several extensions, including a step-by-step training guide, the ability to perform statistical analysis on samples from multiple conditions, and a comparative analysis of metatranscriptomics data. The training module, accessed via the Galaxy Training Network, will help users to use the suite effectively both for functional as well as for taxonomic analysis. We extend the ability of metaQuantome to now perform multi-data-point quantitative and statistical analyses so that studies with measurements across multiple conditions, such as time-course studies, can be analyzed. With an eye on the multiomics analysis of microbial communities, we have also initiated the use of metaQuantome statistical and visualization tools on outputs from metatranscriptomics data, which complements the metagenomic and metaproteomic analyses already available. For this, we have developed a tool named MT2MQ ("metatranscriptomics to metaQuantome"), which takes in outputs from the ASaiM metatranscriptomics workflow and transforms them so that the data can be used as an input for comparative statistical analysis and visualization via metaQuantome. We believe that these improvements to metaQuantome will facilitate the use of the software for quantitative metaproteomics and metatranscriptomics and will enable multipoint data analysis. These improvements will take us a step toward integrative multiomic microbiome analysis so as to understand dynamic taxonomic and functional responses of these complex systems in a variety of biological contexts. The updated metaQuantome and MT2MQ are open-source software and are available via the Galaxy Toolshed and GitHub.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Marie Crane
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Dinh Duy An Nguyen
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås 1432, Norway
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
5
|
Lasch P, Schneider A, Blumenscheit C, Doellinger J. Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS 1) and in Silico Peptide Mass Libraries. Mol Cell Proteomics 2020; 19:2125-2139. [PMID: 32998977 PMCID: PMC7710138 DOI: 10.1074/mcp.tir120.002061] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 09/21/2020] [Indexed: 01/03/2023] Open
Abstract
Over the past decade, modern methods of MS (MS) have emerged that allow reliable, fast and cost-effective identification of pathogenic microorganisms. Although MALDI-TOF MS has already revolutionized the way microorganisms are identified, recent years have witnessed also substantial progress in the development of liquid chromatography (LC)-MS based proteomics for microbiological applications. For example, LC-tandem MS (LC-MS2) has been proposed for microbial characterization by means of multiple discriminative peptides that enable identification at the species, or sometimes at the strain level. However, such investigations can be laborious and time-consuming, especially if the experimental LC-MS2 data are tested against sequence databases covering a broad panel of different microbiological taxa. In this proof of concept study, we present an alternative bottom-up proteomics method for microbial identification. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC-MS measurements. Peptide masses are then extracted from MS1 data and systematically tested against an in silico library of all possible peptide mass data compiled in-house. The library has been computed from the UniProt Knowledgebase covering Swiss-Prot and TrEMBL databases and comprises more than 12,000 strain-specific in silico profiles, each containing tens of thousands of peptide mass entries. Identification analysis involves computation of score values derived from correlation coefficients between experimental and strain-specific in silico peptide mass profiles and compilation of score ranking lists. The taxonomic positions of the microbial samples are then determined by using the best-matching database entries. The suggested method is computationally efficient - less than 2 mins per sample - and has been successfully tested by a test set of 39 LC-MS1 peak lists obtained from 19 different microbial pathogens. The proposed method is rapid, simple and automatable and we foresee wide application potential for future microbiological applications.
Collapse
Affiliation(s)
- Peter Lasch
- Robert Koch-Institute, ZBS6, Proteomics and Spectroscopy, Berlin, Germany.
| | - Andy Schneider
- Robert Koch-Institute, ZBS6, Proteomics and Spectroscopy, Berlin, Germany
| | | | - Joerg Doellinger
- Robert Koch-Institute, ZBS6, Proteomics and Spectroscopy, Berlin, Germany
| |
Collapse
|
6
|
Griss J, Viteri G, Sidiropoulos K, Nguyen V, Fabregat A, Hermjakob H. ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis. Mol Cell Proteomics 2020; 19:2115-2125. [PMID: 32907876 PMCID: PMC7710148 DOI: 10.1074/mcp.tir120.002155] [Citation(s) in RCA: 122] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/28/2020] [Indexed: 01/27/2023] Open
Abstract
Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.
Collapse
Affiliation(s)
- Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom; Department of Dermatology, Medical University of Vienna, Vienna, Austria.
| | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Konstantinos Sidiropoulos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Vy Nguyen
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Antonio Fabregat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom.
| |
Collapse
|
7
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
8
|
Robin T, Mariethoz J, Lisacek F. Examining and Fine-tuning the Selection of Glycan Compositions with GlyConnect Compozitor. Mol Cell Proteomics 2020; 19:1602-1618. [PMID: 32636234 PMCID: PMC8014996 DOI: 10.1074/mcp.ra120.002041] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/01/2020] [Indexed: 01/22/2023] Open
Abstract
A key point in achieving accurate intact glycopeptide identification is the definition of the glycan composition file that is used to match experimental with theoretical masses by a glycoproteomics search engine. At present, these files are mainly built from searching the literature and/or querying data sources focused on posttranslational modifications. Most glycoproteomics search engines include a default composition file that is readily used when processing MS data. We introduce here a glycan composition visualizing and comparative tool associated with the GlyConnect database and called GlyConnect Compozitor. It offers a web interface through which the database can be queried to bring out contextual information relative to a set of glycan compositions. The tool takes advantage of compositions being related to one another through shared monosaccharide counts and outputs interactive graphs summarizing information searched in the database. These results provide a guide for selecting or deselecting compositions in a file in order to reflect the context of a study as closely as possible. They also confirm the consistency of a set of compositions based on the content of the GlyConnect database. As part of the tool collection of the Glycomics@ExPASy initiative, Compozitor is hosted at https://glyconnect.expasy.org/compozitor/ where it can be run as a web application. It is also directly accessible from the GlyConnect database.
Collapse
Affiliation(s)
- Thibault Robin
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Geneva, Switzerland; Computer Science Dept., Faculty of Science, University of Geneva, Switzerland; CALIPHO Group, SIB Swiss Institute of BioinformaticsCMU, Geneva, Switzerland; Microbiology and Molecular Medicine Dept., Faculty of Medicine, University of Geneva, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Geneva, Switzerland; Computer Science Dept., Faculty of Science, University of Geneva, Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Geneva, Switzerland; Computer Science Dept., Faculty of Science, University of Geneva, Switzerland; Section of Biology, Faculty of Science, University of Geneva, Switzerland.
| |
Collapse
|
9
|
Huang T, Choi M, Tzouros M, Golling S, Pandya NJ, Banfai B, Dunkley T, Vitek O. MSstatsTMT: Statistical Detection of Differentially Abundant Proteins in Experiments with Isobaric Labeling and Multiple Mixtures. Mol Cell Proteomics 2020; 19:1706-1723. [PMID: 32680918 PMCID: PMC8015007 DOI: 10.1074/mcp.ra120.002105] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/09/2020] [Indexed: 11/06/2022] Open
Abstract
Tandem mass tag (TMT) is a multiplexing technology widely-used in proteomic research. It enables relative quantification of proteins from multiple biological samples in a single MS run with high efficiency and high throughput. However, experiments often require more biological replicates or conditions than can be accommodated by a single run, and involve multiple TMT mixtures and multiple runs. Such larger-scale experiments combine sources of biological and technical variation in patterns that are complex, unique to TMT-based workflows, and challenging for the downstream statistical analysis. These patterns cannot be adequately characterized by statistical methods designed for other technologies, such as label-free proteomics or transcriptomics. This manuscript proposes a general statistical approach for relative protein quantification in MS- based experiments with TMT labeling. It is applicable to experiments with multiple conditions, multiple biological replicate runs and multiple technical replicate runs, and unbalanced designs. It is based on a flexible family of linear mixed-effects models that handle complex patterns of technical artifacts and missing values. The approach is implemented in MSstatsTMT, a freely available open-source R/Bioconductor package compatible with data processing tools such as Proteome Discoverer, MaxQuant, OpenMS, and SpectroMine. Evaluation on a controlled mixture, simulated datasets, and three biological investigations with diverse designs demonstrated that MSstatsTMT balanced the sensitivity and the specificity of detecting differentially abundant proteins, in large-scale experiments with multiple biological mixtures.
Collapse
Affiliation(s)
- Ting Huang
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Meena Choi
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Manuel Tzouros
- Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Sabrina Golling
- Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Nikhil Janak Pandya
- Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Balazs Banfai
- Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Tom Dunkley
- Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
10
|
Sticker A, Goeminne L, Martens L, Clement L. Robust Summarization and Inference in Proteome-wide Label-free Quantification. Mol Cell Proteomics 2020; 19:1209-1219. [PMID: 32321741 PMCID: PMC7338080 DOI: 10.1074/mcp.ra119.001624] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 04/20/2020] [Indexed: 12/27/2022] Open
Abstract
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.
Collapse
Affiliation(s)
- Adriaan Sticker
- Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Ludger Goeminne
- Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| |
Collapse
|
11
|
Prianichnikov N, Koch H, Koch S, Lubeck M, Heilig R, Brehmer S, Fischer R, Cox J. MaxQuant Software for Ion Mobility Enhanced Shotgun Proteomics. Mol Cell Proteomics 2020; 19:1058-1069. [PMID: 32156793 PMCID: PMC7261821 DOI: 10.1074/mcp.tir119.001720] [Citation(s) in RCA: 91] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 01/31/2020] [Indexed: 01/08/2023] Open
Abstract
Ion mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides and proteins in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.
Collapse
Affiliation(s)
- Nikita Prianichnikov
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Heiner Koch
- Bruker Daltonik GmbH, Farenheitstr. 4, 28359 Bremen, Germany
| | - Scarlet Koch
- Bruker Daltonik GmbH, Farenheitstr. 4, 28359 Bremen, Germany
| | - Markus Lubeck
- Bruker Daltonik GmbH, Farenheitstr. 4, 28359 Bremen, Germany
| | - Raphael Heilig
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, United Kingdom
| | - Sven Brehmer
- Bruker Daltonik GmbH, Farenheitstr. 4, 28359 Bremen, Germany
| | - Roman Fischer
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, United Kingdom
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany; Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
12
|
Shu Q, Li M, Shu L, An Z, Wang J, Lv H, Yang M, Cai T, Hu T, Fu Y, Yang F. Large-scale Identification of N-linked Intact Glycopeptides in Human Serum using HILIC Enrichment and Spectral Library Search. Mol Cell Proteomics 2020; 19:672-689. [PMID: 32102970 PMCID: PMC7124471 DOI: 10.1074/mcp.ra119.001791] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 02/10/2020] [Indexed: 11/12/2022] Open
Abstract
Large-scale identification of N-linked intact glycopeptides by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) in human serum is challenging because of the wide dynamic range of serum protein abundances, the lack of a complete serum N-glycan database and the existence of proteoforms. In this regard, a spectral library search method was presented for the identification of N-linked intact glycopeptides from N-linked glycoproteins in human serum with target-decoy and motif-specific false discovery rate (FDR) control. Serum proteins were firstly separated into low-abundance and high-abundance proteins by acetonitrile (ACN) precipitation. After digestion, the N-linked intact glycopeptides were enriched by hydrophilic interaction liquid chromatography (HILIC) and a portion of the enriched N-linked intact glycopeptides were processed by Peptide-N-Glycosidase F (PNGase F) to generate N-linked deglycopeptides. Both N-linked intact glycopeptides and deglycopeptides were analyzed by LC-MS/MS. From N-linked deglycopeptides data sets, 764 N-linked glycoproteins, 1699 N-linked glycosites and 3328 unique N-linked deglycopeptides were identified. Four types of N-linked glycosylation motifs (NXS/T/C/V, X≠P) were used to recognize the N-linked deglycopeptides. The spectra of these N-linked deglycopeptides were utilized for N-linked deglycopeptides library construction and identification of N-linked intact glycopeptides. A database containing 739 N-glycan masses was constructed and utilized during spectral library search for the identification of N-linked intact glycopeptides. In total, 526 N-linked glycoproteins, 1036 N-linked glycosites, 22,677 N-linked intact glycopeptides and 738 N-glycan masses were identified under 1% FDR, representing the most in-depth serum N-glycoproteome identified by LC-MS/MS at N-linked intact glycopeptide level.
Collapse
Affiliation(s)
- Qingbo Shu
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
| | - Mengjie Li
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
| | - Lian Shu
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhiwu An
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
| | - Jifeng Wang
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Hao Lv
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112; Research Center for Basic Sciences of Medicine, Basic Medical College, Guizhou Medical University, Guiyang 550025, China
| | - Ming Yang
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
| | - Tanxi Cai
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
| | - Tony Hu
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Yan Fu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China; Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112.
| | - Fuquan Yang
- Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
13
|
Larsen DN, Mikkelsen CE, Kierkegaard M, Bereta GP, Nowakowska Z, Kaczmarek JZ, Potempa J, Højrup P. Citrullinome of Porphyromonas gingivalis Outer Membrane Vesicles: Confident Identification of Citrullinated Peptides. Mol Cell Proteomics 2020; 19:167-180. [PMID: 31754044 PMCID: PMC6944236 DOI: 10.1074/mcp.ra119.001700] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 11/12/2019] [Indexed: 12/20/2022] Open
Abstract
Porphyromonas gingivalis is a key pathogen in chronic periodontitis and has recently been mechanistically linked to the development of rheumatoid arthritis via the activity of peptidyl arginine deiminase generating citrullinated epitopes in the periodontium. In this project the outer membrane vesicles (OMV) from P. gingivalis W83 wild-type (WT), a W83 knock-out mutant of peptidyl arginine deiminase (ΔPPAD), and a mutant strain expressing PPAD with the active site cysteine mutated to alanine (C351A), have been analyzed using a two-dimensional HFBA-based separation system combined with LC-MS. For optimal and positive identification and validation of citrullinated peptides and proteins, high resolution mass spectrometers and strict MS search criteria were utilized. This may have compromised the total number of identified citrullinations but increased the confidence of the validation. A new two-dimensional separation system proved to increase the strength of validation, and along with the use of an in-house build program, Citrullia, we establish a fast and easy semi-automatic (manual) validation of citrullinated peptides. For the WT OMV we identified 78 citrullinated proteins having a total of 161 citrullination sites. Notably, in keeping with the mechanism of OMV formation, the majority (51 out of 78) of citrullinated proteins were predicted to be exported via the inner membrane and to reside in the periplasm or being translocated to the bacterial surface. Citrullinated surface proteins may contribute to the pathogenesis of rheumatoid arthritis. For the C351A-OMV a single citrullination site was found and no citrullinations were identified for the ΔPPAD-OMV, thus validating the unbiased character of our method of citrullinated peptide identification.
Collapse
Affiliation(s)
| | | | | | - Grzegorz P Bereta
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Krakow, Poland
| | - Zuzanna Nowakowska
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Krakow, Poland; Malopolska Center of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Jakub Z Kaczmarek
- Research and Development Department, Ovodan Biotech A/S, 5000 Odense, Denmark
| | - Jan Potempa
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Krakow, Poland; Department of Oral Immunology and Infectious Diseases, University of Louisville School of Dentistry, 501 S. Preston St., Louisville, Kentucky
| | - Peter Højrup
- University of Southern Denmark, Campusvej 55, Odense M, Denmark.
| |
Collapse
|
14
|
Alvarez B, Reynisson B, Barra C, Buus S, Ternette N, Connelley T, Andreatta M, Nielsen M. NNAlign_MA; MHC Peptidome Deconvolution for Accurate MHC Binding Motif Characterization and Improved T-cell Epitope Predictions. Mol Cell Proteomics 2019; 18:2459-2477. [PMID: 31578220 PMCID: PMC6885703 DOI: 10.1074/mcp.tir119.001658] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 09/25/2019] [Indexed: 01/03/2023] Open
Abstract
The set of peptides presented on a cell's surface by MHC molecules is known as the immunopeptidome. Current mass spectrometry technologies allow for identification of large peptidomes, and studies have proven these data to be a rich source of information for learning the rules of MHC-mediated antigen presentation. Immunopeptidomes are usually poly-specific, containing multiple sequence motifs matching the MHC molecules expressed in the system under investigation. Motif deconvolution -the process of associating each ligand to its presenting MHC molecule(s)- is therefore a critical and challenging step in the analysis of MS-eluted MHC ligand data. Here, we describe NNAlign_MA, a computational method designed to address this challenge and fully benefit from large, poly-specific data sets of MS-eluted ligands. NNAlign_MA simultaneously performs the tasks of (1) clustering peptides into individual specificities; (2) automatic annotation of each cluster to an MHC molecule; and (3) training of a prediction model covering all MHCs present in the training set. NNAlign_MA was benchmarked on large and diverse data sets, covering class I and class II data. In all cases, the method was demonstrated to outperform state-of-the-art methods, effectively expanding the coverage of alleles for which accurate predictions can be made, resulting in improved identification of both eluted ligands and T-cell epitopes. Given its high flexibility and ease of use, we expect NNAlign_MA to serve as an effective tool to increase our understanding of the rules of MHC antigen presentation and guide the development of novel T-cell-based therapeutics.
Collapse
Affiliation(s)
- Bruno Alvarez
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Argentina
| | - Birkir Reynisson
- Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark
| | - Carolina Barra
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Argentina
| | - Søren Buus
- Department of Immunology and Microbiology, Faculty of Health Sciences, University of Copenhagen, Denmark
| | - Nicola Ternette
- The Jenner Institute, Nuffield Department of Medicine, Oxford, United Kingdom
| | - Tim Connelley
- Roslin Institute, Edinburgh, Midlothian, United Kingdom
| | - Massimo Andreatta
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Argentina
| | - Morten Nielsen
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Argentina; Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark. mailto:
| |
Collapse
|
15
|
Michalak W, Tsiamis V, Schwämmle V, Rogowska-Wrzesińska A. ComplexBrowser: A Tool for Identification and Quantification of Protein Complexes in Large-scale Proteomics Datasets. Mol Cell Proteomics 2019; 18:2324-2334. [PMID: 31447428 PMCID: PMC6823858 DOI: 10.1074/mcp.tir119.001434] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 07/27/2019] [Indexed: 12/25/2022] Open
Abstract
We have developed ComplexBrowser, an open source, online platform for supervised analysis of quantitative proteomic data (label free and isobaric mass tag based) that focuses on protein complexes. The software uses manually curated information from CORUM and Complex Portal databases to identify protein complex components. For the first time, we provide a Complex Fold Change (CFC) factor that identifies up- and downregulated complexes based on the level of complex subunits coregulation. The software provides interactive visualization of protein complexes' composition and expression for exploratory analysis and incorporates a quality control step that includes normalization and statistical analysis based on the limma package. ComplexBrowser was tested on two published studies identifying changes in protein expression within either human adenocarcinoma tissue or activated mouse T-cells. The analysis revealed 1519 and 332 protein complexes, of which 233 and 41 were found coordinately regulated in the respective studies. The adopted approach provided evidence for a shift to glucose-based metabolism and high proliferation in adenocarcinoma tissues, and the identification of chromatin remodeling complexes involved in mouse T-cell activation. The results correlate with the original interpretation of the experiments and provide novel biological details about the protein complexes affected. ComplexBrowser is, to our knowledge, the first tool to automate quantitative protein complex analysis for high-throughput studies, providing insights into protein complex regulation within minutes of analysis.
Collapse
Affiliation(s)
- Wojciech Michalak
- Department of Biochemistry & Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark
| | - Vasileios Tsiamis
- Department of Biochemistry & Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark
| | - Veit Schwämmle
- Department of Biochemistry & Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark
| | - Adelina Rogowska-Wrzesińska
- Department of Biochemistry & Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark.
| |
Collapse
|
16
|
Ammar C, Gruber M, Csaba G, Zimmer R. MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins. Mol Cell Proteomics 2019; 18:1880-1892. [PMID: 31235637 PMCID: PMC6731086 DOI: 10.1074/mcp.ra119.001509] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/12/2019] [Indexed: 11/06/2022] Open
Abstract
Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.
Collapse
Affiliation(s)
- Constantin Ammar
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany
| | - Markus Gruber
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany
| | - Gergely Csaba
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany
| | - Ralf Zimmer
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany.
| |
Collapse
|
17
|
Shabardina V, Kischka T, Manske F, Grundmann N, Frith MC, Suzuki Y, Makałowski W. NanoPipe-a web server for nanopore MinION sequencing data analysis. Gigascience 2019; 8:giy169. [PMID: 30689855 PMCID: PMC6377397 DOI: 10.1093/gigascience/giy169] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 12/10/2018] [Accepted: 12/23/2018] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The fast-moving progress of the third-generation long-read sequencing technologies will soon bring the biological and medical sciences to a new era of research. Altogether, the technique and experimental procedures are becoming more straightforward and available to biologists from diverse fields, even without any profound experience in DNA sequencing. Thus, the introduction of the MinION device by Oxford Nanopore Technologies promises to "bring sequencing technology to the masses" and also allows quick and operative analysis in field studies. However, the convenience of this sequencing technology dramatically contrasts with the available analysis tools, which may significantly reduce enthusiasm of a "regular" user. To really bring the sequencing technology to every biologist, we need a set of user-friendly tools that can perform a powerful analysis in an automatic manner. FINDINGS NanoPipe was developed in consideration of the specifics of the MinION sequencing technologies, providing accordingly adjusted alignment parameters. The range of the target species/sequences for the alignment is not limited, and the descriptive usage page of NanoPipe helps a user to succeed with NanoPipe analysis. The results contain alignment statistics, consensus sequence, polymorphisms data, and visualization of the alignment. Several test cases are used to demonstrate the efficiency of the tool. CONCLUSIONS Freely available NanoPipe software allows effortless and reliable analysis of MinION sequencing data for experienced bioinformaticians, as well for wet-lab biologists with minimum bioinformatics knowledge. Moreover, for the latter group, we describe the basic algorithm necessary for MinION sequencing analysis from the first to last step.
Collapse
Affiliation(s)
- Victoria Shabardina
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Tabea Kischka
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Felix Manske
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Norbert Grundmann
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, 2-3-26, Aomi, Koto-ku, Tokyo, 135-0064, Japan
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
- AIST-Waseda University Computational Bio Big Data Open Innovation Laboratory, 3-4-1 Ookubo, Shinjuku-ku, Tokyo, 169-8555, Japan
| | - Yutaka Suzuki
- Laboratory of Systems Genomics, Department of Computational Biology and Medical Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Wojciech Makałowski
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| |
Collapse
|
18
|
Stanfill BA, Nakayasu ES, Bramer LM, Thompson AM, Ansong CK, Clauss TR, Gritsenko MA, Monroe ME, Moore RJ, Orton DJ, Piehowski PD, Schepmoes AA, Smith RD, Webb-Robertson BJM, Metz TO. Quality Control Analysis in Real-time (QC-ART): A Tool for Real-time Quality Control Assessment of Mass Spectrometry-based Proteomics Data. Mol Cell Proteomics 2018; 17:1824-1836. [PMID: 29666158 PMCID: PMC6126382 DOI: 10.1074/mcp.ra118.000648] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 03/13/2018] [Indexed: 12/29/2022] Open
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample issues in near real-time for LC-MS-based plasma proteomics analyses of a sample subset of The Environmental Determinants of Diabetes in the Young cohort. We also present a case where QC-ART facilitated the identification of oxidative modifications, which are often underappreciated in proteomic experiments.
Collapse
Affiliation(s)
| | | | - Lisa M Bramer
- From the ‡Computational and Statistical Analytics Division
| | - Allison M Thompson
- ¶Environmental and Molecular Sciences Laboratory, 902 Battelle Blvd, Pacific Northwest National Laboratory, Richland, Washington
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Deutsch EW, Orchard S, Binz PA, Bittremieux W, Eisenacher M, Hermjakob H, Kawano S, Lam H, Mayer G, Menschaert G, Perez-Riverol Y, Salek RM, Tabb DL, Tenzer S, Vizcaíno JA, Walzer M, Jones AR. Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J Proteome Res 2017; 16:4288-4298. [PMID: 28849660 PMCID: PMC5715286 DOI: 10.1021/acs.jproteome.7b00370] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, cochairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Furthermore, new standards are currently either in the final stages of completion (proBed and proBAM for proteogenomics results as well as PEFF) or in early stages of design (a spectral library standard format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PROXI) web services Application Programming Interface). In this work we review the current status of all of these aspects of the PSI, describe synergies with other efforts such as the ProteomeXchange Consortium, the Human Proteome Project, and the metabolomics community, and provide a look at future directions of the PSI.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois , 1011 Lausanne, Switzerland
| | - Wout Bittremieux
- Department of Mathematics and Computer Science, University of Antwerp , Middelheimlaan 1, 2020 Antwerp, Belgium
| | - Martin Eisenacher
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum , D-44801 Bochum, Germany
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences, Beijing , Beijing 102206, China
| | - Shin Kawano
- Database Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems , Kashiwa, Chiba 277-0871, Japan
| | - Henry Lam
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Clear Water Bay, Hong Kong, P. R. China.,Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology , Clear Water Bay, Hong Kong, P. R. China
| | - Gerhard Mayer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum , D-44801 Bochum, Germany
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics (BioBix), Faculty of Bioscience Engineering, Ghent University , 9000 Ghent, Belgium
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Reza M Salek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L Tabb
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town, South Africa
| | - Stefan Tenzer
- Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz , 55131 Mainz, Germany
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R Jones
- Institute of Integrative Biology, University of Liverpool , South Wirral L64 4AY, United Kingdom
| |
Collapse
|
20
|
Ficarro SB, Alexander WM, Marto JA. mzStudio: A Dynamic Digital Canvas for User-Driven Interrogation of Mass Spectrometry Data. Proteomes 2017; 5:proteomes5030020. [PMID: 28763045 PMCID: PMC5620537 DOI: 10.3390/proteomes5030020] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Revised: 07/14/2017] [Accepted: 07/27/2017] [Indexed: 11/17/2022] Open
Abstract
Although not yet truly ‘comprehensive’, modern mass spectrometry-based experiments can generate quantitative data for a meaningful fraction of the human proteome. Importantly for large-scale protein expression analysis, robust data pipelines are in place for identification of un-modified peptide sequences and aggregation of these data to protein-level quantification. However, interoperable software tools that enable scientists to computationally explore and document novel hypotheses for peptide sequence, modification status, or fragmentation behavior are not well-developed. Here, we introduce mzStudio, an open-source Python module built on our multiplierz project. This desktop application provides a highly-interactive graphical user interface (GUI) through which scientists can examine and annotate spectral features, re-search existing PSMs to test different modifications or new spectral matching algorithms, share results with colleagues, integrate other domain-specific software tools, and finally create publication-quality graphics. mzStudio leverages our common application programming interface (mzAPI) for access to native data files from multiple instrument platforms, including ion trap, quadrupole time-of-flight, Orbitrap, matrix-assisted laser desorption ionization, and triple quadrupole mass spectrometers and is compatible with several popular search engines including Mascot, Proteome Discoverer, X!Tandem, and Comet. The mzStudio toolkit enables researchers to create a digital provenance of data analytics and other evidence that support specific peptide sequence assignments.
Collapse
Affiliation(s)
- Scott B Ficarro
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02115, USA.
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02215, USA.
| | - William M Alexander
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02115, USA.
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02215, USA.
| | - Jarrod A Marto
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02115, USA.
- Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
21
|
Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014; 11:1138-1140. [PMID: 25262207 PMCID: PMC4216143 DOI: 10.1038/nmeth.3115] [Citation(s) in RCA: 448] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 08/19/2014] [Indexed: 01/07/2023]
Abstract
RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de/). Supported assays include whole-genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays and any other protocol that produces high-resolution DNA methylation data. Notable applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts.
Collapse
Affiliation(s)
- Yassen Assenov
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Fabian Müller
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Pavlo Lutsik
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
| | - Jörn Walter
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
| | | | - Christoph Bock
- Max Planck Institute for Informatics, Saarbrücken, Germany
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
22
|
Abstract
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.
Collapse
|
23
|
Affiliation(s)
- David Roy Smith
- Department of Biology, University of Western Ontario London, ON, Canada
| |
Collapse
|
24
|
Affiliation(s)
- David Roy Smith
- Department of Biology, University of Western Ontario London, ON, Canada
| |
Collapse
|