1
|
Ulrich JU, Epping L, Pilz T, Walther B, Stingl K, Semmler T, Renard BY. Nanopore adaptive sampling effectively enriches bacterial plasmids. mSystems 2024; 9:e0094523. [PMID: 38376263 PMCID: PMC10949517 DOI: 10.1128/msystems.00945-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
Bacterial plasmids play a major role in the spread of antibiotic resistance genes. However, their characterization via DNA sequencing suffers from the low abundance of plasmid DNA in those samples. Although sample preparation methods can enrich the proportion of plasmid DNA before sequencing, these methods are expensive and laborious, and they might introduce a bias by enriching only for specific plasmid DNA sequences. Nanopore adaptive sampling could overcome these issues by rejecting uninteresting DNA molecules during the sequencing process. In this study, we assess the application of adaptive sampling for the enrichment of low-abundant plasmids in known bacterial isolates using two different adaptive sampling tools. We show that a significant enrichment can be achieved even on expired flow cells. By applying adaptive sampling, we also improve the quality of de novo plasmid assemblies and reduce the sequencing time. However, our experiments also highlight issues with adaptive sampling if target and non-target sequences span similar regions. IMPORTANCE Antimicrobial resistance causes millions of deaths every year. Mobile genetic elements like bacterial plasmids are key drivers for the dissemination of antimicrobial resistance genes. This makes the characterization of plasmids via DNA sequencing an important tool for clinical microbiologists. Since plasmids are often underrepresented in bacterial samples, plasmid sequencing can be challenging and laborious. To accelerate the sequencing process, we evaluate nanopore adaptive sampling as an in silico method for the enrichment of low-abundant plasmids. Our results show the potential of this cost-efficient method for future plasmid research but also indicate issues that arise from using reference sequences.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, Berlin, Germany
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, Wildau, Germany
| | - Lennard Epping
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Tanja Pilz
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Birgit Walther
- Advanced Light and Electron Microscopy, Robert Koch Institute, Berlin, Germany
| | - Kerstin Stingl
- National Reference Laboratory for Campylobacter, Department of Biological Safety, German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| | - Torsten Semmler
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|
2
|
Nasri F, Kongkitimanon K, Wittig A, Cortés JS, Brinkmann A, Nitsche A, Schmachtenberg AJ, Renard BY, Fuchs S. MpoxRadar: a worldwide MPXV genomic surveillance dashboard. Nucleic Acids Res 2023:7160218. [PMID: 37167010 DOI: 10.1093/nar/gkad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/16/2023] [Accepted: 05/03/2023] [Indexed: 05/12/2023] Open
Abstract
The mpox virus (MPXV) is mutating at an exceptional rate for a DNA virus and its global spread is concerning, making genomic surveillance a necessity. With MpoxRadar, we provide an interactive dashboard to track virus variants on mutation level worldwide. MpoxRadar allows users to select among different genomes as reference for comparison. The occurrence of mutation profiles based on the selected reference is indicated on an interactive world map that shows the respective geographic sampling site in customizable time ranges to easily follow the frequency or trend of defined mutations. Furthermore, the user can filter for specific mutations, genes, countries, genome types, and sequencing protocols and download the filtered data directly from MpoxRadar. On the server, we automatically download all MPXV genomes and metadata from the National Center for Biotechnology Information (NCBI) on a daily basis, align them to the different reference genomes, generate mutation profiles, which are stored and linked to the available metainformation in a database. This makes MpoxRadar a practical tool for the genomic survaillance of MPXV, supporting users with limited computational resources. MpoxRadar is open-source and freely accessible at https://MpoxRadar.net.
Collapse
Affiliation(s)
- Ferdous Nasri
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
| | - Kunaphas Kongkitimanon
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
| | - Alice Wittig
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
- Genome Competence Center (MF1), Robert Koch Institute, Seestraße 10, 13353 Berlin, Germany
| | - Jorge Sánchez Cortés
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
| | - Annika Brinkmann
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Seestrasse 10, Berlin 13353, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Seestrasse 10, Berlin 13353, Germany
| | - Anna-Juliane Schmachtenberg
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
| | - Bernhard Y Renard
- Data Analytics & Computational Statistics, Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany
| | - Stephan Fuchs
- Genome Competence Center (MF1), Robert Koch Institute, Seestraße 10, 13353 Berlin, Germany
| |
Collapse
|
3
|
Garrels T, Khodabakhsh A, Renard BY, Baum K. LazyFox: fast and parallelized overlapping community detection in large graphs. PeerJ Comput Sci 2023; 9:e1291. [PMID: 37346513 PMCID: PMC10280410 DOI: 10.7717/peerj-cs.1291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 02/20/2023] [Indexed: 06/23/2023]
Abstract
The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, Fox, that detects such overlapping communities. Fox measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LazyFox, a multi-threaded adaptation of the Fox algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LazyFox enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LazyFox's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.
Collapse
Affiliation(s)
- Tim Garrels
- Hasso Plattner Institute for Digital Engineering gGmbH, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Athar Khodabakhsh
- Hasso Plattner Institute for Digital Engineering gGmbH, Potsdam, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute for Digital Engineering gGmbH, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Free University Berlin, Berlin, Germany
| | - Katharina Baum
- Hasso Plattner Institute for Digital Engineering gGmbH, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, USA
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
4
|
Akter F, Bonini S, Ponnaiyan S, Kögler-Mohrbacher B, Bleibaum F, Damme M, Renard BY, Winter D. Multi-Cell Line Analysis of Lysosomal Proteomes Reveals Unique Features and Novel Lysosomal Proteins. Mol Cell Proteomics 2023; 22:100509. [PMID: 36791992 PMCID: PMC10025164 DOI: 10.1016/j.mcpro.2023.100509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 02/01/2023] [Accepted: 02/06/2023] [Indexed: 02/15/2023] Open
Abstract
Lysosomes, the main degradative organelles of mammalian cells, play a key role in the regulation of metabolism. It is becoming more and more apparent that they are highly active, diverse, and involved in a large variety of processes. The essential role of lysosomes is exemplified by the detrimental consequences of their malfunction, which can result in lysosomal storage disorders, neurodegenerative diseases, and cancer. Using lysosome enrichment and mass spectrometry, we investigated the lysosomal proteomes of HEK293, HeLa, HuH-7, SH-SY5Y, MEF, and NIH3T3 cells. We provide evidence on a large scale for cell type-specific differences of lysosomes, showing that levels of distinct lysosomal proteins are highly variable within one cell type, while expression of others is highly conserved across several cell lines. Using differentially stable isotope-labeled cells and bimodal distribution analysis, we furthermore identify a high confidence population of lysosomal proteins for each cell line. Multi-cell line correlation of these data reveals potential novel lysosomal proteins, and we confirm lysosomal localization for six candidates. All data are available via ProteomeXchange with identifier PXD020600.
Collapse
Affiliation(s)
- Fatema Akter
- Institute for Biochemistry and Molecular Biology, Medical Faculty, University of Bonn, Bonn, Germany; Department of Pharmacology, Faculty of Veterinary Science, Bangladesh Agricultural University, Mymensingh, Bangladesh
| | - Sara Bonini
- Institute for Biochemistry and Molecular Biology, Medical Faculty, University of Bonn, Bonn, Germany
| | - Srigayatri Ponnaiyan
- Institute for Biochemistry and Molecular Biology, Medical Faculty, University of Bonn, Bonn, Germany
| | | | | | - Markus Damme
- Institute for Biochemistry, University of Kiel, Kiel, Germany
| | | | - Dominic Winter
- Institute for Biochemistry and Molecular Biology, Medical Faculty, University of Bonn, Bonn, Germany.
| |
Collapse
|
5
|
Piro VC, Renard BY. Contamination detection and microbiome exploration with GRIMER. Gigascience 2022; 12:7094242. [PMID: 36994872 PMCID: PMC10061425 DOI: 10.1093/gigascience/giad017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 02/06/2023] [Accepted: 03/01/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased outcomes. Detecting and removing true contaminants is challenging, especially in low-biomass samples or in studies lacking proper controls. Interactive visualizations and analysis platforms are crucial to better guide this step, to help to identify and detect noisy patterns that could potentially be contamination. Additionally, external evidence, like aggregation of several contamination detection methods and the use of common contaminants reported in the literature, could help to discover and mitigate contamination. RESULTS We propose GRIMER, a tool that performs automated analyses and generates a portable and interactive dashboard integrating annotation, taxonomy, and metadata. It unifies several sources of evidence to help detect contamination. GRIMER is independent of quantification methods and directly analyzes contingency tables to create an interactive and offline report. Reports can be created in seconds and are accessible for nonspecialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled and used an extensive list of possible external contaminant taxa and common contaminants with 210 genera and 627 species reported in 22 published articles. CONCLUSION GRIMER enables visual data exploration and analysis, supporting contamination detection in microbiome studies. The tool and data presented are open source and available at https://gitlab.com/dacs-hpi/grimer.
Collapse
Affiliation(s)
- Vitor C Piro
- Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
6
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2022; 24:6955273. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Georg Tscheuschner
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Bernhard Y Renard
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Michael G Weller
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Thilo Muth
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| |
Collapse
|
7
|
Hiort P, Hugo J, Zeinert J, Müller N, Kashyap S, Rajapakse JC, Azuaje F, Renard BY, Baum K. DrDimont: explainable drug response prediction from differential analysis of multi-omics networks. Bioinformatics 2022; 38:ii113-ii119. [PMID: 36124784 PMCID: PMC9486584 DOI: 10.1093/bioinformatics/btac477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. RESULTS We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response. AVAILABILITY AND IMPLEMENTATION DrDimont is available on CRAN: https://cran.r-project.org/package=DrDimont. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pauline Hiort
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Julian Hugo
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Justus Zeinert
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Nataniel Müller
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Spoorthi Kashyap
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | | | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | | |
Collapse
|
8
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
9
|
Wittig A, Miranda F, Hölzer M, Altenburg T, Bartoszewicz JM, Beyvers S, Dieckmann MA, Genske U, Giese SH, Nowicka M, Richard H, Schiebenhoefer H, Schmachtenberg AJ, Sieben P, Tang M, Tembrockhaus J, Renard BY, Fuchs S. CovRadar: continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance. Bioinformatics 2022; 38:4223-4225. [PMID: 35799354 DOI: 10.1093/bioinformatics/btac411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/13/2022] [Accepted: 06/13/2022] [Indexed: 12/24/2022] Open
Abstract
SUMMARY The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast. AVAILABILITY AND IMPLEMENTATION CovRadar is freely accessible at https://covradar.net, its open-source code is available at https://gitlab.com/dacs-hpi/covradar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alice Wittig
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany.,Methods Development, Research Infrastructure and Information Technology (MFI), Bioinformatics and Systems Biology, Robert Koch Institute, Berlin, Germany
| | - Fábio Miranda
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Martin Hölzer
- Methods Development, Research Infrastructure and Information Technology (MFI), Bioinformatics and Systems Biology, Robert Koch Institute, Berlin, Germany
| | - Tom Altenburg
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany.,Methods Development, Research Infrastructure and Information Technology (MFI), Bioinformatics and Systems Biology, Robert Koch Institute, Berlin, Germany
| | - Sebastian Beyvers
- Department of Biology and Chemistry, Justus-Liebig-University Gießen, Gießen 35390, Germany
| | - Marius A Dieckmann
- Department of Biology and Chemistry, Justus-Liebig-University Gießen, Gießen 35390, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Sven H Giese
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Melania Nowicka
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Hugues Richard
- Methods Development, Research Infrastructure and Information Technology (MFI), Bioinformatics and Systems Biology, Robert Koch Institute, Berlin, Germany
| | - Henning Schiebenhoefer
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | | | - Paul Sieben
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Ming Tang
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany.,Department of Human Genetics, Hannover Medical School, Hannover 30625, Germany
| | - Julius Tembrockhaus
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, Potsdam 14482, Germany
| | - Stephan Fuchs
- Methods Development, Research Infrastructure and Information Technology (MFI), Bioinformatics and Systems Biology, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
10
|
Altenburg T, Giese SH, Wang S, Muth T, Renard BY. Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00467-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
AbstractMass spectrometry-based proteomics provides a holistic snapshot of the entire protein set of living cells on a molecular level. Currently, only a few deep learning approaches exist that involve peptide fragmentation spectra, which represent partial sequence information of proteins. Commonly, these approaches lack the ability to characterize less studied or even unknown patterns in spectra because of their use of explicit domain knowledge. Here, to elevate unrestricted learning from spectra, we introduce ‘ad hoc learning of fragmentation’ (AHLF), a deep learning model that is end-to-end trained on 19.2 million spectra from several phosphoproteomic datasets. AHLF is interpretable, and we show that peak-level feature importance values and pairwise interactions between peaks are in line with corresponding peptide fragments. We demonstrate our approach by detecting post-translational modifications, specifically protein phosphorylation based on only the fragmentation spectrum without a database search. AHLF increases the area under the receiver operating characteristic curve (AUC) by an average of 9.4% on recent phosphoproteomic data compared with the current state of the art on this task. Furthermore, use of AHLF in rescoring search results increases the number of phosphopeptide identifications by a margin of up to 15.1% at a constant false discovery rate. To show the broad applicability of AHLF, we use transfer learning to also detect cross-linked peptides, as used in protein structure analysis, with an AUC of up to 94%.
Collapse
|
11
|
Hiort P, Schlaffner CN, Steen JA, Renard BY, Steen H. multiFLEX-LF: A Computational Approach to Quantify the Modification Stoichiometries in Label-Free Proteomics Data Sets. J Proteome Res 2022; 21:899-909. [PMID: 35086334 PMCID: PMC9936407 DOI: 10.1021/acs.jproteome.1c00669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In liquid-chromatography-tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry of protein modifications is not readily available. To overcome this problem, we developed multiFLEX-LF, a computational tool that builds upon FLEXIQuant, which detects modified peptide precursors and quantifies their modification extent by monitoring the differences between observed and expected intensities of the unmodified precursors. multiFLEX-LF relies on robust linear regression to calculate the modification extent of a given precursor relative to a within-study reference. multiFLEX-LF can analyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyze modification dynamics and coregulated modifications, we hierarchically clustered the precursors of all proteins based on their computed relative modification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time points during mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modification events. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and the quantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF can drive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies. multiFLEX-LF is available at https://gitlab.com/SteenOmicsLab/multiflex-lf.
Collapse
Affiliation(s)
- Pauline Hiort
- Department of Pathology, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam 14482, Germany
| | - Christoph N Schlaffner
- Department of Pathology, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam 14482, Germany.,F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Department of Neurology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Judith A Steen
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Department of Neurology, Harvard Medical School, Boston, Massachusetts 02115, United States.,Neurobiology Program, Boston Children's Hospital, Boston, Massachusetts 02115, United States
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam 14482, Germany
| | - Hanno Steen
- Department of Pathology, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Neurobiology Program, Boston Children's Hospital, Boston, Massachusetts 02115, United States.,Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, United States.,Precision Vaccines Program, Boston Children's Hospital, Boston, Massachusetts 02115, United States
| |
Collapse
|
12
|
Abstract
MOTIVATION Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. RESULTS Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background. AVAILABILITY AND IMPLEMENTATION The C++ source code is available at https://gitlab.com/dacs-hpi/readbouncer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ahmad Lutfi
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Kilian Rutzen
- Genome Sequencing Unit (MF2), Robert Koch Institute, 13353 Berlin, Germany
| | | |
Collapse
|
13
|
Van Den Bossche T, Kunath BJ, Schallert K, Schäpe SS, Abraham PE, Armengaud J, Arntzen MØ, Bassignani A, Benndorf D, Fuchs S, Giannone RJ, Griffin TJ, Hagen LH, Halder R, Henry C, Hettich RL, Heyer R, Jagtap P, Jehmlich N, Jensen M, Juste C, Kleiner M, Langella O, Lehmann T, Leith E, May P, Mesuere B, Miotello G, Peters SL, Pible O, Queiros PT, Reichl U, Renard BY, Schiebenhoefer H, Sczyrba A, Tanca A, Trappe K, Trezzi JP, Uzzau S, Verschaffelt P, von Bergen M, Wilmes P, Wolf M, Martens L, Muth T. Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows. Nat Commun 2021; 12:7305. [PMID: 34911965 PMCID: PMC8674281 DOI: 10.1038/s41467-021-27542-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/24/2021] [Indexed: 12/17/2022] Open
Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Stephanie S Schäpe
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Ariane Bassignani
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Stephan Fuchs
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Timothy J Griffin
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Live H Hagen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert Heyer
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Pratik Jagtap
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Marlene Jensen
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Catherine Juste
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Manuel Kleiner
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Theresa Lehmann
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Emma Leith
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bart Mesuere
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Guylaine Miotello
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Samantha L Peters
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Olivier Pible
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Pedro T Queiros
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Udo Reichl
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | | | - Alessandro Tanca
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Jean-Pierre Trezzi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg Institute of Health, 1, rue Louis Rech, L-3555, Dudelange, Luxembourg
| | - Sergio Uzzau
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Pieter Verschaffelt
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Maximilian Wolf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
| |
Collapse
|
14
|
Van Den Bossche T, Kunath BJ, Schallert K, Schäpe SS, Abraham PE, Armengaud J, Arntzen MØ, Bassignani A, Benndorf D, Fuchs S, Giannone RJ, Griffin TJ, Hagen LH, Halder R, Henry C, Hettich RL, Heyer R, Jagtap P, Jehmlich N, Jensen M, Juste C, Kleiner M, Langella O, Lehmann T, Leith E, May P, Mesuere B, Miotello G, Peters SL, Pible O, Queiros PT, Reichl U, Renard BY, Schiebenhoefer H, Sczyrba A, Tanca A, Trappe K, Trezzi JP, Uzzau S, Verschaffelt P, von Bergen M, Wilmes P, Wolf M, Martens L, Muth T. Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows. Nat Commun 2021; 12:7305. [PMID: 34911965 DOI: 10.1101/2021.03.05.433915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/24/2021] [Indexed: 05/21/2023] Open
Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Stephanie S Schäpe
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Ariane Bassignani
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Stephan Fuchs
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Timothy J Griffin
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Live H Hagen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert Heyer
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Pratik Jagtap
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Marlene Jensen
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Catherine Juste
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Manuel Kleiner
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Theresa Lehmann
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Emma Leith
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bart Mesuere
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Guylaine Miotello
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Samantha L Peters
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Olivier Pible
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Pedro T Queiros
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Udo Reichl
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | | | - Alessandro Tanca
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Jean-Pierre Trezzi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg Institute of Health, 1, rue Louis Rech, L-3555, Dudelange, Luxembourg
| | - Sergio Uzzau
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Pieter Verschaffelt
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Maximilian Wolf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
| |
Collapse
|
15
|
Bartoszewicz JM, Genske U, Renard BY. Deep learning-based real-time detection of novel pathogens during sequencing. Brief Bioinform 2021; 22:6326527. [PMID: 34297793 DOI: 10.1093/bib/bbab269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| |
Collapse
|
16
|
Sanchini A, Jandrasits C, Tembrockhaus J, Kohl TA, Utpatel C, Maurer FP, Niemann S, Haas W, Renard BY, Kröger S. Improving tuberculosis surveillance by detecting international transmission using publicly available whole genome sequencing data. ACTA ACUST UNITED AC 2021; 26. [PMID: 33446303 PMCID: PMC7809720 DOI: 10.2807/1560-7917.es.2021.26.2.1900677] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
IntroductionImproving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR) TB. The large amount of publicly available whole genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analyses at a large scale.AimWe assessed the usefulness of raw WGS data of global MDR/XDR Mycobacterium tuberculosis isolates available from public repositories to improve TB surveillance.MethodsWe extracted raw WGS data and the related metadata of M. tuberculosis isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR M. tuberculosis isolates from Germany in 2012 and 2013.ResultsWe aggregated a dataset that included 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, Cluster 2 included 56 MDR/XDR isolates from Moldova, Georgia and Germany. When comparing the WGS data from Germany with the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.ConclusionWe demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. Comparing the German with the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.
Collapse
Affiliation(s)
- Andrea Sanchini
- These authors contributed equally to this manuscript.,Respiratory Infections Unit (FG36), Department of Infectious Disease Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Christine Jandrasits
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany.,These authors contributed equally to this manuscript
| | - Julius Tembrockhaus
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Thomas Andreas Kohl
- German Center for Infection Research (DZIF), partner site Hamburg - Lübeck - Borstel - Riems, Germany.,Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Christian Utpatel
- German Center for Infection Research (DZIF), partner site Hamburg - Lübeck - Borstel - Riems, Germany.,Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Florian P Maurer
- National and WHO Supranational Reference Laboratory for Mycobacteria, Research Center Borstel, Borstel, Germany
| | - Stefan Niemann
- German Center for Infection Research (DZIF), partner site Hamburg - Lübeck - Borstel - Riems, Germany.,Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Walter Haas
- Respiratory Infections Unit (FG36), Department of Infectious Disease Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Stefan Kröger
- German Center for Infection Research (DZIF), partner site Hannover - Brunswick, Germany.,Respiratory Infections Unit (FG36), Department of Infectious Disease Epidemiology, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
17
|
Abstract
MOTIVATION The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. RESULTS Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification. AVAILABILITY AND IMPLEMENTATION The software is open-source and available at: https://gitlab.com/rki_bioinformatics/ganon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vitor C Piro
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin 13353, Germany.,CAPES Foundation, Ministry of Education of Brazil, Brasília 70040-020, Brazil.,Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Temesgen H Dadi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Enrico Seiler
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin 13353, Germany.,Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
18
|
Verschaffelt P, Van Den Bossche T, Gabriel W, Burdukiewicz M, Soggiu A, Martens L, Renard BY, Schiebenhoefer H, Mesuere B. MegaGO: A Fast Yet Powerful Approach to Assess Functional Gene Ontology Similarity across Meta-Omics Data Sets. J Proteome Res 2021; 20:2083-2088. [PMID: 33661648 DOI: 10.1021/acs.jproteome.0c00926] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.
Collapse
Affiliation(s)
- Pieter Verschaffelt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent 9000, Belgium
| | - Wassim Gabriel
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising 85354, Germany
| | - Michał Burdukiewicz
- Laboratory of Mass Spectrometry, Institute of Biochemistry and Biophysics Polish Academy of Sciences, Warsaw 02-106, Poland
| | - Alessio Soggiu
- "One Health" Section, Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan 20122, Italy
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent 9000, Belgium
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, Potsdam 14482, Germany.,Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Henning Schiebenhoefer
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, Potsdam 14482, Germany.,Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Bart Mesuere
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
| |
Collapse
|
19
|
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genom Bioinform 2021; 3:lqab004. [PMID: 33554119 PMCID: PMC7849996 DOI: 10.1093/nargab/lqab004] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 01/21/2023] Open
Abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| | - Anja Seidel
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| |
Collapse
|
20
|
Schlaffner CN, Kahnert K, Muntel J, Chauhan R, Renard BY, Steen JA, Steen H. FLEXIQuant-LF to quantify protein modification extent in label-free proteomics data. eLife 2020; 9:e58783. [PMID: 33284109 PMCID: PMC7721442 DOI: 10.7554/elife.58783] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 11/23/2020] [Indexed: 12/22/2022] Open
Abstract
Improvements in LC-MS/MS methods and technology have enabled the identification of thousands of modified peptides in a single experiment. However, protein regulation by post-translational modifications (PTMs) is not binary, making methods to quantify the modification extent crucial to understanding the role of PTMs. Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without knowledge of the types of modifications involved. We developed FLEXIQuant-LF using label-free quantification of unmodified peptides and robust linear regression to quantify the modification extent of peptides. As proof of concept, we applied FLEXIQuant-LF to data-independent-acquisition (DIA) data of the anaphase promoting complex/cyclosome (APC/C) during mitosis. The unbiased FLEXIQuant-LF approach to assess the modification extent in quantitative proteomics data provides a better understanding of the function and regulation of PTMs. The software is available at https://github.com/SteenOmicsLab/FLEXIQuantLF.
Collapse
Affiliation(s)
- Christoph N Schlaffner
- F.M. Kirby Neurobiology Center, Boston Children’s HospitalBostonUnited States
- Department of Neurology, Harvard Medical SchoolBostonUnited States
| | - Konstantin Kahnert
- Department of Pathology, Boston Children’s HospitalBostonUnited States
- Bioinformatics Unit (MF1), Robert Koch InstituteBerlinGermany
- Department of Medical Biotechnology, Institute of Biotechnology, Technische Universität BerlinBerlinGermany
| | - Jan Muntel
- Department of Pathology, Boston Children’s HospitalBostonUnited States
- Department of Pathology, Harvard Medical SchoolBostonUnited States
| | - Ruchi Chauhan
- F.M. Kirby Neurobiology Center, Boston Children’s HospitalBostonUnited States
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Robert Koch InstituteBerlinGermany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of PotsdamPotsdamGermany
| | - Judith A Steen
- F.M. Kirby Neurobiology Center, Boston Children’s HospitalBostonUnited States
- Department of Neurology, Harvard Medical SchoolBostonUnited States
| | - Hanno Steen
- Department of Pathology, Boston Children’s HospitalBostonUnited States
- Department of Pathology, Harvard Medical SchoolBostonUnited States
- Precision Vaccines Program, Boston Children’s HospitalBostonUnited States
| |
Collapse
|
21
|
Muñoz-Benavent M, Hartkopf F, Van Den Bossche T, Piro VC, García-Ferris C, Latorre A, Renard BY, Muth T. Erratum: gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms. NAR Genom Bioinform 2020; 2:lqaa083. [PMID: 33577626 PMCID: PMC7671335 DOI: 10.1093/nargab/lqaa083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
[This corrects the article DOI: 10.1093/nargab/lqaa058.].
Collapse
Affiliation(s)
- Maria Muñoz-Benavent
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | | | - Vitor C Piro
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | - Carlos García-Ferris
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Amparo Latorre
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| |
Collapse
|
22
|
Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020; 15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Emanuel Schmid
- ID Computational & Data Science Support, Eidgenössische Technische Hochschule, Zurich, Switzerland
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Katharina Riedel
- Center for Functional Genomics of Microbes (CFGM), Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Stephan Fuchs
- Department of Infectious Diseases, Robert Koch Institute, Wernigerode, Germany.
| |
Collapse
|
23
|
Bartoszewicz JM, Seidel A, Rentzsch R, Renard BY. DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 2020; 36:81-89. [PMID: 31298694 DOI: 10.1093/bioinformatics/btz541] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 06/22/2019] [Accepted: 07/10/2019] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. RESULTS We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. AVAILABILITY AND IMPLEMENTATION The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Anja Seidel
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Robert Rentzsch
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
24
|
Muñoz-Benavent M, Hartkopf F, Van Den Bossche T, Piro VC, García-Ferris C, Latorre A, Renard BY, Muth T. gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms. NAR Genom Bioinform 2020; 2:lqaa058. [PMID: 33575609 PMCID: PMC7671378 DOI: 10.1093/nargab/lqaa058] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 06/19/2020] [Accepted: 08/03/2020] [Indexed: 01/14/2023] Open
Abstract
The study of bacterial symbioses has grown exponentially in the recent past. However, existing bioinformatic workflows of microbiome data analysis do commonly not integrate multiple meta-omics levels and are mainly geared toward human microbiomes. Microbiota are better understood when analyzed in their biological context; that is together with their host or environment. Nevertheless, this is a limitation when studying non-model organisms mainly due to the lack of well-annotated sequence references. Here, we present gNOMO, a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels: metagenomics, metatranscriptomics and metaproteomics in an integrative manner. The pipeline has been developed using the workflow management framework Snakemake in order to obtain an automated and reproducible pipeline. Using experimental datasets of the German cockroach Blattella germanica, a non-model organism with very complex gut microbiome, we show the capabilities of gNOMO with regard to meta-omics data integration, expression ratio comparison, taxonomic and functional analysis as well as intuitive output visualization. In conclusion, gNOMO is a bioinformatic pipeline that can easily be configured, for integrating and analyzing multiple meta-omics data types and for producing output visualizations, specifically designed for integrating paired-end sequencing data with mass spectrometry from non-model organisms.
Collapse
Affiliation(s)
- Maria Muñoz-Benavent
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | | | - Vitor C Piro
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | - Carlos García-Ferris
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Amparo Latorre
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València/CSIC, Paterna (València) 46980, Spain
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin 13353, Germany
| |
Collapse
|
25
|
Van Den Bossche T, Verschaffelt P, Schallert K, Barsnes H, Dawyndt P, Benndorf D, Renard BY, Mesuere B, Martens L, Muth T. Connecting MetaProteomeAnalyzer and PeptideShaker to Unipept for Seamless End-to-End Metaproteomics Data Analysis. J Proteome Res 2020; 19:3562-3566. [DOI: 10.1021/acs.jproteome.0c00136] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, St. Pietersnieuwstraat 33, 9000 Ghent, Belgium
| | - Pieter Verschaffelt
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Krijgslaan 281-S9, 9000 Ghent, Belgium
| | - Kay Schallert
- Bioprocess Engineering, Faculty for Process and Systems Engineering, Otto von Guericke University, Universitaetsplatz 2, 39106 Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Bernburger Straße 55, 06366 Köthen, Germany
| | - Harald Barsnes
- Proteomics Unit (PROBE), Department of Biomedicine, University of Bergen, Postboks 7804, NO-5020 Bergen, Norway
- Computational Biology Unit (CBU), Department of Informatics, University of Bergen, Postboks 7804, N-5020 Bergen, Norway
| | - Peter Dawyndt
- Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Krijgslaan 281-S9, 9000 Ghent, Belgium
| | - Dirk Benndorf
- Bioprocess Engineering, Faculty for Process and Systems Engineering, Otto von Guericke University, Universitaetsplatz 2, 39106 Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Bernburger Straße 55, 06366 Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstraße, 39106 Magdeburg, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
- Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Prof.-Dr.-Helmert-Straße 2 – 3, 14482 Potsdam, Germany
| | - Bart Mesuere
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, St. Pietersnieuwstraat 33, 9000 Ghent, Belgium
- Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Krijgslaan 281-S9, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, St. Pietersnieuwstraat 33, 9000 Ghent, Belgium
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
- eScience Division (S.3), Federal Institute for Materials Research and Testing, Unter den Eichen 87, 12205 Berlin, Germany
| |
Collapse
|
26
|
Kuhring M, Doellinger J, Nitsche A, Muth T, Renard BY. TaxIt: An Iterative Computational Pipeline for Untargeted Strain-Level Identification Using MS/MS Spectra from Pathogenic Single-Organism Samples. J Proteome Res 2020; 19:2501-2510. [PMID: 32362126 DOI: 10.1021/acs.jproteome.9b00714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry is a challenging task. Reference databases often lack taxonomic depth, limiting peptide assignments to the species level. However, the extension with detailed strain information increases runtime and decreases statistical power. In addition, larger databases contain a higher number of similar proteomes. We present TaxIt, an iterative workflow to address the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to increase the confidence in candidate taxa. For benchmarking the performance of our method, we apply our iterative workflow on several samples of bacterial and viral origin. In comparison to noniterative approaches using unique peptides or advanced abundance correction, TaxIt identifies microbial strains correctly in all examples presented (with one tie), thereby demonstrating the potential for untargeted and deeper taxonomic classification. TaxIt makes extensive use of public, unrestricted, and continuously growing sequence resources such as the NCBI databases and is available under open-source BSD license at https://gitlab.com/rki_bioinformatics/TaxIt.
Collapse
Affiliation(s)
- Mathias Kuhring
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Core Unit Bioinformatics, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Berlin Institute of Health Metabolomics Platform, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Max Delbrück Center (MDC) for Molecular Medicine, 13125 Berlin, Germany
| | - Joerg Doellinger
- Centre for Biological Threats and Special Pathogens, Proteomics and Spectroscopy (ZBS 6), Robert Koch Institute, 13353 Berlin, Germany.,Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,eScience Division (S.3), Federal Institute for Materials Research and Testing, 12489 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
| |
Collapse
|
27
|
Anzt H, Bach F, Druskat S, Löffler F, Loewe A, Renard BY, Seemann G, Struck A, Achhammer E, Aggarwal P, Appel F, Bader M, Brusch L, Busse C, Chourdakis G, Dabrowski PW, Ebert P, Flemisch B, Friedl S, Fritzsch B, Funk MD, Gast V, Goth F, Grad JN, Hegewald J, Hermann S, Hohmann F, Janosch S, Kutra D, Linxweiler J, Muth T, Peters-Kottig W, Rack F, Raters FH, Rave S, Reina G, Reißig M, Ropinski T, Schaarschmidt J, Seibold H, Thiele JP, Uekermann B, Unger S, Weeber R. An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action. F1000Res 2020; 9:295. [PMID: 33552475 PMCID: PMC7845155 DOI: 10.12688/f1000research.23224.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/09/2020] [Indexed: 08/22/2023] Open
Abstract
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
Collapse
Affiliation(s)
- Hartwig Anzt
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- University of Tennessee, Knoxville, TN, USA
| | - Felix Bach
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Stephan Druskat
- Friedrich Schiller University, Jena, Germany
- German Aerospace Center (DLR), Berlin, Germany
- Humboldt-Universität zu Berlin, Berlin, Germany
| | - Frank Löffler
- Friedrich Schiller University, Jena, Germany
- Louisiana State University, Baton Rouge, LA, USA
| | - Axel Loewe
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Gunnar Seemann
- University Heart Centre Freiburg Bad Krozingen, Freiburg, Germany
| | | | | | | | - Franziska Appel
- Leibniz Institute of Agricultural Development in Transition Economies (IAMO), Halle (Saale), Germany
| | | | - Lutz Brusch
- Technische Universität Dresden, Dresden, Germany
| | | | | | | | - Peter Ebert
- Saarland Informatics Campus, Saarbrücken, Germany
| | | | | | | | | | - Volker Gast
- Friedrich Schiller University, Jena, Germany
| | | | | | | | | | | | - Stephan Janosch
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Dominik Kutra
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan Linxweiler
- Technische Universität Braunschweig, Braunschweig, Germany
| | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | | | - Fabian Rack
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Karlsruhe, Germany
| | | | | | | | - Malte Reißig
- Institute for Advanced Sustainability Studies, Potsdam, Germany
| | - Timo Ropinski
- Ulm University, Ulm, Germany
- Linköping University, Linköping, Sweden
| | | | - Heidi Seibold
- Ludwig Maximilian University of Munich, München, Germany
| | | | | | - Stefan Unger
- Julius Kühn-Institut (JKI), Quedlinburg, Germany
| | | |
Collapse
|
28
|
Anzt H, Bach F, Druskat S, Löffler F, Loewe A, Renard BY, Seemann G, Struck A, Achhammer E, Aggarwal P, Appel F, Bader M, Brusch L, Busse C, Chourdakis G, Dabrowski PW, Ebert P, Flemisch B, Friedl S, Fritzsch B, Funk MD, Gast V, Goth F, Grad JN, Hegewald J, Hermann S, Hohmann F, Janosch S, Kutra D, Linxweiler J, Muth T, Peters-Kottig W, Rack F, Raters FH, Rave S, Reina G, Reißig M, Ropinski T, Schaarschmidt J, Seibold H, Thiele JP, Uekermann B, Unger S, Weeber R. An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action. F1000Res 2020; 9:295. [PMID: 33552475 PMCID: PMC7845155 DOI: 10.12688/f1000research.23224.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/11/2021] [Indexed: 11/20/2022] Open
Abstract
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
Collapse
Affiliation(s)
- Hartwig Anzt
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- University of Tennessee, Knoxville, TN, USA
| | - Felix Bach
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Stephan Druskat
- Friedrich Schiller University, Jena, Germany
- German Aerospace Center (DLR), Berlin, Germany
- Humboldt-Universität zu Berlin, Berlin, Germany
| | - Frank Löffler
- Friedrich Schiller University, Jena, Germany
- Louisiana State University, Baton Rouge, LA, USA
| | - Axel Loewe
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Gunnar Seemann
- University Heart Centre Freiburg Bad Krozingen, Freiburg, Germany
| | | | | | | | - Franziska Appel
- Leibniz Institute of Agricultural Development in Transition Economies (IAMO), Halle (Saale), Germany
| | | | - Lutz Brusch
- Technische Universität Dresden, Dresden, Germany
| | | | | | | | - Peter Ebert
- Saarland Informatics Campus, Saarbrücken, Germany
| | | | | | | | | | - Volker Gast
- Friedrich Schiller University, Jena, Germany
| | | | | | | | | | | | - Stephan Janosch
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Dominik Kutra
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan Linxweiler
- Technische Universität Braunschweig, Braunschweig, Germany
| | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | | | - Fabian Rack
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Karlsruhe, Germany
| | | | | | | | - Malte Reißig
- Institute for Advanced Sustainability Studies, Potsdam, Germany
| | - Timo Ropinski
- Ulm University, Ulm, Germany
- Linköping University, Linköping, Sweden
| | | | - Heidi Seibold
- Ludwig Maximilian University of Munich, München, Germany
| | | | | | - Stefan Unger
- Julius Kühn-Institut (JKI), Quedlinburg, Germany
| | | |
Collapse
|
29
|
Jandrasits C, Kröger S, Haas W, Renard BY. Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters. PLoS Comput Biol 2019; 15:e1007527. [PMID: 31815935 PMCID: PMC6922483 DOI: 10.1371/journal.pcbi.1007527] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 12/19/2019] [Accepted: 11/03/2019] [Indexed: 12/30/2022] Open
Abstract
Next-generation sequencing based base-by-base distance measures have become an integral complement to epidemiological investigation of infectious disease outbreaks. This study introduces PANPASCO, a computational pan-genome mapping based, pairwise distance method that is highly sensitive to differences between cases, even when located in regions of lineage specific reference genomes. We show that our approach is superior to previously published methods in several datasets and across different Mycobacterium tuberculosis lineages, as its characteristics allow the comparison of a high number of diverse samples in one analysis—a scenario that becomes more and more likely with the increased usage of whole-genome sequencing in transmission surveillance. Tuberculosis still is a threat to global health. It is essential to detect and interrupt transmissions to stop the spread of this infectious disease. With the rising use of next-generation sequencing methods, its application in the surveillance of Mycobacterium tuberculosis has become increasingly important in the last years. The main goal of molecular surveillance is the identification of patient-patient transmission and cluster detection. The mutation rate of M. tuberculosis is very low and stable. Therefore, many existing methods for comparative analysis of isolates provide inadequate results since their resolution is too limited. There is a need for a method that takes every detectable difference into account. We developed PANPASCO, a novel approach for comparing pairs of isolates using all genomic information available for each pair. We combine improved SNP-distance calculation with the use of a pan-genome incorporating more than 100 M. tuberculosis reference genomes representing lineages 1-4 for read mapping prior to variant detection. We thereby enable the collective analysis and comparison of similar and diverse isolates associated with different M. tuberculosis strains.
Collapse
Affiliation(s)
| | - Stefan Kröger
- Respiratory Infections Unit, Robert Koch Institute, Berlin, Germany
| | - Walter Haas
- Respiratory Infections Unit, Robert Koch Institute, Berlin, Germany
| | | |
Collapse
|
30
|
Abstract
The sequential paradigm of data acquisition and analysis in next-generation sequencing leads to high turnaround times for the generation of interpretable results. We combined a novel real-time read mapping algorithm with fast variant calling to obtain reliable variant calls still during the sequencing process. Thereby, our new algorithm allows for accurate read mapping results for intermediate cycles and supports large reference genomes such as the complete human reference. This enables the combination of real-time read mapping results with complex follow-up analysis. In this study, we showed the accuracy and scalability of our approach by applying real-time read mapping and variant calling to seven publicly available human whole exome sequencing datasets. Thereby, up to 89% of all detected SNPs were already identified after 40 sequencing cycles while showing similar precision as at the end of sequencing. Final results showed similar accuracy to those of conventional post-hoc analysis methods. When compared to standard routines, our live approach enables considerably faster interventions in clinical applications and infectious disease outbreaks. Besides variant calling, our approach can be adapted for a plethora of other mapping-based analyses.
Collapse
Affiliation(s)
- Tobias P Loka
- Bioinformatics Division (MF 1), Department for Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Simon H Tausch
- Bioinformatics Division (MF 1), Department for Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, Berlin, Germany
- German Federal Institute for Risk Assessment (BfR), Department of Biological Safety, Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Division (MF 1), Department for Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany.
| |
Collapse
|
31
|
Rentzsch R, Deneke C, Nitsche A, Renard BY. Predicting bacterial virulence factors - evaluation of machine learning and negative data strategies. Brief Bioinform 2019; 21:1596-1608. [PMID: 32978619 DOI: 10.1093/bib/bbz076] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 05/17/2019] [Accepted: 06/01/2019] [Indexed: 11/12/2022] Open
Abstract
Bacterial proteins dubbed virulence factors (VFs) are a highly diverse group of sequences, whose only obvious commonality is the very property of being, more or less directly, involved in virulence. It is therefore tempting to speculate whether their prediction, based on direct sequence similarity (seqsim) to known VFs, could be enhanced or even replaced by using machine-learning methods. Specifically, when trained on a large and diverse set of VFs, such may be able to detect putative, non-trivial characteristics shared by otherwise unrelated VF families and therefore better predict novel VFs with insignificant similarity to each individual family. We therefore first reassess the performance of dimer-based Support Vector Machines, as used in the widely used MP3 method, in light of seqsim-only and seqsim/dimer-hybrid classifiers. We then repeat the analysis with a novel, considerably more diverse data set, also addressing the important problem of negative data selection. Finally, we move on to the real-world use case of proteome-wide VF prediction, outlining different approaches to estimating specificity in this scenario. We find that direct seqsim is of unparalleled importance and therefore should always be exploited. Further, we observe strikingly low correlations between different feature and classifier types when ranking proteins by VF likeness. We therefore propose a 'best of each world' approach to prioritize proteins for experimental testing, focussing on the top predictions of each classifier. Further, classifiers for individual VF families should be developed.
Collapse
Affiliation(s)
- Robert Rentzsch
- Bioinformatics Unit (MF 1), Robert Koch Institute, Berlin.,Institute for Innovation and Technology (IIT), Steinplatz 1, Berlin
| | - Carlus Deneke
- Bioinformatics Unit (MF 1), Robert Koch Institute, Berlin.,Molecular Microbiology and Genome Analysis Unit, German Federal Institute for Risk Assessment, Berlin
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, Berlin
| | | |
Collapse
|
32
|
Edenborough KM, Bokelmann M, Lander A, Couacy-Hymann E, Lechner J, Drechsel O, Renard BY, Radonić A, Feldmann H, Kurth A, Prescott J. Dendritic Cells Generated From Mops condylurus, a Likely Filovirus Reservoir Host, Are Susceptible to and Activated by Zaire Ebolavirus Infection. Front Immunol 2019; 10:2414. [PMID: 31681302 PMCID: PMC6797855 DOI: 10.3389/fimmu.2019.02414] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 09/26/2019] [Indexed: 12/17/2022] Open
Abstract
Ebola virus infection of human dendritic cells (DCs) induces atypical adaptive immune responses and thereby exacerbates Ebola virus disease (EVD). Human DCs, infected with Ebola virus aberrantly express low levels of the DC activation markers CD80, CD86, and MHC class II. The T cell responses ensuing are commonly anergic rather than protective against EVD. We hypothesize that DCs derived from potential reservoir hosts such as bats, which do not develop disease signs in response to Ebola virus infection, would exhibit features associated with activation. In this study, we have examined Zaire ebolavirus (EBOV) infection of DCs derived from the Angolan free-tailed bat species, Mops condylurus. This species was previously identified as permissive to EBOV infection in vivo, in the absence of disease signs. M. condylurus has also been recently implicated as the reservoir host for Bombali ebolavirus, a virus species that is closely related to EBOV. Due to the absence of pre-existing M. condylurus species-specific reagents, we characterized its de novo assembled transcriptome and defined its phylogenetic similarity to other mammals, which enabled the identification of cross-reactive reagents for M. condylurus bone marrow-derived DC (bat-BMDC) differentiation and immune cell phenotyping. Our results reveal that bat-BMDCs are susceptible to EBOV infection as determined by detection of EBOV specific viral RNA (vRNA). vRNA increased significantly 72 h after EBOV-infection and was detected in both cells and in culture supernatants. Bat-BMDC infection was further confirmed by the observation of GFP expression in DC cultures infected with a recombinant GFP-EBOV. Bat-BMDCs upregulated CD80 and chemokine ligand 3 (CCL3) transcripts in response to EBOV infection, which positively correlated with the expression levels of EBOV vRNA. In contrast to the aberrant responses to EBOV infection that are typical for human-DC, our findings from bat-BMDCs provide evidence for an immunological basis of asymptomatic EBOV infection outcomes.
Collapse
Affiliation(s)
- Kathryn M Edenborough
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Marcel Bokelmann
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Angelika Lander
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Emmanuel Couacy-Hymann
- LANADA, Laboratoire National d'Appui au Développement Agricole, Bingerville, Côte d'Ivoire
| | - Johanna Lechner
- Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Oliver Drechsel
- Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Aleksandar Radonić
- Methodology and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Heinz Feldmann
- Laboratory of Virology, Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, ON, United States
| | - Andreas Kurth
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Joseph Prescott
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
33
|
Andrusch A, Dabrowski PW, Klenner J, Tausch SH, Kohl C, Osman AA, Renard BY, Nitsche A. PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples. Bioinformatics 2019; 34:i715-i721. [PMID: 30423069 PMCID: PMC6129269 DOI: 10.1093/bioinformatics/bty595] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Motivation Next generation sequencing (NGS) has provided researchers with a powerful tool to characterize metagenomic and clinical samples in research and diagnostic settings. NGS allows an open view into samples useful for pathogen detection in an unbiased fashion and without prior hypothesis about possible causative agents. However, NGS datasets for pathogen detection come with different obstacles, such as a very unfavorable ratio of pathogen to host reads. Alongside often appearing false positives and irrelevant organisms, such as contaminants, tools are often challenged by samples with low pathogen loads and might not report organisms present below a certain threshold. Furthermore, some metagenomic profiling tools are only focused on one particular set of pathogens, for example bacteria. Results We present PAIPline, a bioinformatics pipeline specifically designed to address problems associated with detecting pathogens in diagnostic samples. PAIPline particularly focuses on userfriendliness and encapsulates all necessary steps from preprocessing to resolution of ambiguous reads and filtering up to visualization in a single tool. In contrast to existing tools, PAIPline is more specific while maintaining sensitivity. This is shown in a comparative evaluation where PAIPline was benchmarked along other well-known metagenomic profiling tools on previously published well-characterized datasets. Additionally, as part of an international cooperation project, PAIPline was applied to an outbreak sample of hemorrhagic fevers of then unknown etiology. The presented results show that PAIPline can serve as a robust, reliable, user-friendly, adaptable and generalizable stand-alone software for diagnostics from NGS samples and as a stepping stone for further downstream analyses. Availability and implementation PAIPline is freely available under https://gitlab.com/rki_bioinformatics/paipline.
Collapse
Affiliation(s)
- Andreas Andrusch
- Highly Pathogenic Viruses (ZBS1), Robert Koch Institute, Berlin, Germany
| | | | - Jeanette Klenner
- Highly Pathogenic Viruses (ZBS1), Robert Koch Institute, Berlin, Germany
| | - Simon H Tausch
- Highly Pathogenic Viruses (ZBS1), Robert Koch Institute, Berlin, Germany
| | - Claudia Kohl
- Highly Pathogenic Viruses (ZBS1), Robert Koch Institute, Berlin, Germany
| | | | | | - Andreas Nitsche
- Highly Pathogenic Viruses (ZBS1), Robert Koch Institute, Berlin, Germany
| |
Collapse
|
34
|
Dadi TH, Siragusa E, Piro VC, Andrusch A, Seiler E, Renard BY, Reinert K. DREAM-Yara: an exact read mapper for very large databases with short update time. Bioinformatics 2019; 34:i766-i772. [PMID: 30423080 DOI: 10.1093/bioinformatics/bty567] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Motivation Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times. Results To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation https://gitlab.com/pirovc/dream_yara/.
Collapse
Affiliation(s)
| | - Enrico Siragusa
- Computational Genomics, IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Vitor C Piro
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany.,CAPES Foundation, Ministry of Education of Brazil, Brasília DF, Brazil
| | - Andreas Andrusch
- Centre for Biological Threats and Special Pathogens (ZBS1), Robert Koch Institute, Berlin, Germany
| | - Enrico Seiler
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | | | - Knut Reinert
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Berlin, Germany
| |
Collapse
|
35
|
Uretmen Kagiali ZC, Sanal E, Karayel Ö, Polat AN, Saatci Ö, Ersan PG, Trappe K, Renard BY, Önder TT, Tuncbag N, Şahin Ö, Ozlu N. Systems-level Analysis Reveals Multiple Modulators of Epithelial-mesenchymal Transition and Identifies DNAJB4 and CD81 as Novel Metastasis Inducers in Breast Cancer. Mol Cell Proteomics 2019; 18:1756-1771. [PMID: 31221721 PMCID: PMC6731077 DOI: 10.1074/mcp.ra119.001446] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 05/21/2019] [Indexed: 01/01/2023] Open
Abstract
Epithelial-mesenchymal transition (EMT) is driven by complex signaling events that induce dramatic biochemical and morphological changes whereby epithelial cells are converted into cancer cells. However, the underlying molecular mechanisms remain elusive. Here, we used mass spectrometry based quantitative proteomics approach to systematically analyze the post-translational biochemical changes that drive differentiation of human mammary epithelial (HMLE) cells into mesenchymal. We identified 314 proteins out of more than 6,000 unique proteins and 871 phosphopeptides out of more than 7,000 unique phosphopeptides as differentially regulated. We found that phosphoproteome is more unstable and prone to changes during EMT compared with the proteome and multiple alterations at proteome level are not thoroughly represented by transcriptional data highlighting the necessity of proteome level analysis. We discovered cell state specific signaling pathways, such as Hippo, sphingolipid signaling, and unfolded protein response (UPR) by modeling the networks of regulated proteins and potential kinase-substrate groups. We identified two novel factors for EMT whose expression increased on EMT induction: DnaJ heat shock protein family (Hsp40) member B4 (DNAJB4) and cluster of differentiation 81 (CD81). Suppression of DNAJB4 or CD81 in mesenchymal breast cancer cells resulted in decreased cell migration in vitro and led to reduced primary tumor growth, extravasation, and lung metastasis in vivo Overall, we performed the global proteomic and phosphoproteomic analyses of EMT, identified and validated new mRNA and/or protein level modulators of EMT. This work also provides a unique platform and resource for future studies focusing on metastasis and drug resistance.
Collapse
Affiliation(s)
| | - Erdem Sanal
- ‡Department of Molecular Biology and Genetics, Koç University, 34450 Istanbul, Turkey
| | - Özge Karayel
- ‡Department of Molecular Biology and Genetics, Koç University, 34450 Istanbul, Turkey
| | - Ayse Nur Polat
- ‡Department of Molecular Biology and Genetics, Koç University, 34450 Istanbul, Turkey
| | - Özge Saatci
- §Department of Drug Discovery and Biomedical Sciences, University of South Carolina, Columbia, SC 29208
| | - Pelin Gülizar Ersan
- ¶Department of Molecular Biology and Genetics, Faculty of Science, Bilkent University, 06800 Ankara, Turkey
| | - Kathrin Trappe
- ‖Bioinformatics Unit (MF1), Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- ‖Bioinformatics Unit (MF1), Robert Koch Institute, 13353 Berlin, Germany
| | - Tamer T Önder
- **Koç University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey; ‡‡School of Medicine, Koç University, 34450 Istanbul, Turkey
| | - Nurcan Tuncbag
- §§Graduate School of Informatics, Department of Health Informatics, METU, 06800 Ankara, Turkey; ¶¶Cancer Systems Biology Laboratory (CanSyL), METU, 06800 Ankara, Turkey
| | - Özgür Şahin
- §Department of Drug Discovery and Biomedical Sciences, University of South Carolina, Columbia, SC 29208; ¶Department of Molecular Biology and Genetics, Faculty of Science, Bilkent University, 06800 Ankara, Turkey
| | - Nurhan Ozlu
- ‡Department of Molecular Biology and Genetics, Koç University, 34450 Istanbul, Turkey; **Koç University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey.
| |
Collapse
|
36
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
37
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
38
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
39
|
Rausch S, Midha A, Kuhring M, Affinass N, Radonic A, Kühl AA, Bleich A, Renard BY, Hartmann S. Parasitic Nematodes Exert Antimicrobial Activity and Benefit From Microbiota-Driven Support for Host Immune Regulation. Front Immunol 2018; 9:2282. [PMID: 30349532 PMCID: PMC6186814 DOI: 10.3389/fimmu.2018.02282] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 09/14/2018] [Indexed: 12/04/2022] Open
Abstract
Intestinal parasitic nematodes live in intimate contact with the host microbiota. Changes in the microbiome composition during nematode infection affect immune control of the parasites and shifts in the abundance of bacterial groups have been linked to the immunoregulatory potential of nematodes. Here we asked if the small intestinal parasite Heligmosomoides polygyrus produces factors with antimicrobial activity, senses its microbial environment and if the anti-nematode immune and regulatory responses are altered in mice devoid of gut microbes. We found that H. polygyrus excretory/secretory products exhibited antimicrobial activity against gram+/− bacteria. Parasites from germ-free mice displayed alterations in gene expression, comprising factors with putative antimicrobial functions such as chitinase and lysozyme. Infected germ-free mice developed increased small intestinal Th2 responses coinciding with a reduction in local Foxp3+RORγt+ regulatory T cells and decreased parasite fecundity. Our data suggest that nematodes sense their microbial surrounding and have evolved factors that limit the outgrowth of certain microbes. Moreover, the parasites benefit from microbiota-driven immune regulatory circuits, as an increased ratio of intestinal Th2 effector to regulatory T cells coincides with reduced parasite fitness in germ-free mice.
Collapse
Affiliation(s)
- Sebastian Rausch
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| | - Ankur Midha
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| | - Matthias Kuhring
- Bioinformatics Unit (MF 1), Robert Koch Institute, Berlin, Germany.,Core Unit Bioinformatics, Berlin Institute of Health (BIH), Berlin, Germany.,Berlin Institute of Health Metabolomics Platform, Berlin Institute of Health (BIH), Berlin, Germany.,Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Nicole Affinass
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| | - Aleksandar Radonic
- Centre for Biological Threats and Special Pathogens (ZBS 1), Robert Koch Institute, Berlin, Germany.,Genome Sequencing Unit (MF 2), Robert Koch Institute, Berlin, Germany
| | - Anja A Kühl
- iPATH.Berlin, Core Unit for Immunopathology for Experimental Models, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
| | - André Bleich
- Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
| | | | - Susanne Hartmann
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
40
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
41
|
Ebner F, Kuhring M, Radonić A, Midha A, Renard BY, Hartmann S. Silent Witness: Dual-Species Transcriptomics Reveals Epithelial Immunological Quiescence to Helminth Larval Encounter and Fostered Larval Development. Front Immunol 2018; 9:1868. [PMID: 30158930 PMCID: PMC6104121 DOI: 10.3389/fimmu.2018.01868] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/30/2018] [Indexed: 11/17/2022] Open
Abstract
Gastrointestinal nematodes are among the most prevalent parasites infecting humans and livestock worldwide. Infective larvae of the soil-transmitted nematode Ascaris spp. enter the host and start tissue migration by crossing the intestinal epithelial barrier. The initial interaction of the intestinal epithelium with the parasite, however, has received little attention. In a time-resolved interaction model of porcine intestinal epithelial cells (IPEC-J2) and infective Ascaris suum larvae, we addressed the early transcriptional changes occurring simultaneously in both organisms using dual-species RNA-Seq. Functional analysis of the host response revealed an overall induction of metabolic activity, without induction of immune responsive genes or immune signaling pathways and showing suppression of chemotactic genes like CXCL8/IL-8 or CHI3L1. Ascaris larvae, when getting in contact with the epithelium, showed induction of genes that orchestrate motor activity and larval development, such as myosin, troponin, myoglobin, and protein disulfide isomerase 2 (PDI-2). In addition, excretory-secretory products that likely facilitate parasite invasion were increased, among them, aspartic protease 6 or hyaluronidase. Integration of host and pathogen data in an interspecies gene co-expression network indicated links between nematode fatty acid biosynthesis and host ribosome assembly/protein synthesis. In summary, our study provides new molecular insights into the early factors of parasite invasion, while at the same time revealing host immunological unresponsiveness. Reproducible software for dual RNA-Seq analysis of non-model organisms is available at https://gitlab.com/mkuhring/project_asuum and can be applied to similar studies.
Collapse
Affiliation(s)
- Friederike Ebner
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| | - Mathias Kuhring
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany.,Core Unit Bioinformatics, Berlin Institute of Health (BIH), Berlin, Germany.,Berlin Institute of Health Metabolomics Platform, Berlin Institute of Health (BIH), Berlin, Germany.,Max Delbrück Center (MDC) for Molecular Medicine, Berlin, Germany
| | - Aleksandar Radonić
- Center for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, Berlin, Germany
| | - Ankur Midha
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Susanne Hartmann
- Department of Veterinary Medicine, Institute of Immunology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
42
|
Loka TP, Tausch SH, Dabrowski PW, Radonic A, Nitsche A, Renard BY. PriLive: privacy-preserving real-time filtering for next-generation sequencing. Bioinformatics 2018. [PMID: 29522157 DOI: 10.1093/bioinformatics/bty128] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation In next-generation sequencing, re-identification of individuals and other privacy-breaching strategies can be applied even for anonymized data. This also holds true for applications in which human DNA is acquired as a by-product, e.g. for viral or metagenomic samples from a human host. Conventional data protection strategies including cryptography and post-hoc filtering are only appropriate for the final and processed sequencing data. This can result in an insufficient level of data protection and a considerable time delay in the further analysis workflow. Results We present PriLive, a novel tool for the automated removal of sensitive data while the sequencing machine is running. Thereby, human sequence information can be detected and removed before being completely produced. This facilitates the compliance with strict data protection regulations. The unique characteristic to cause almost no time delay for further analyses is also a clear benefit for applications other than data protection. Especially if the sequencing data are dominated by known background signals, PriLive considerably accelerates consequent analyses by having only fractions of input data. Besides these conceptual advantages, PriLive achieves filtering results at least as accurate as conventional post-hoc filtering tools. Availability and implementation PriLive is open-source software available at https://gitlab.com/rki_bioinformatics/PriLive. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tobias P Loka
- Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure
| | - Simon H Tausch
- Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure.,Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1)
| | - Piotr W Dabrowski
- Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure
| | - Aleksandar Radonic
- Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1).,Genome Sequencing Unit (MF 2), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1)
| | - Bernhard Y Renard
- Bioinformatics Division (MF 1), Department for Methods Development and Research Infrastructure
| |
Collapse
|
43
|
Tausch SH, Strauch B, Andrusch A, Loka TP, Lindner MS, Nitsche A, Renard BY. LiveKraken––real-time metagenomic classification of illumina data. Bioinformatics 2018; 34:3750-3752. [DOI: 10.1093/bioinformatics/bty433] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 05/30/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Simon H Tausch
- Bioinformatics Unit (MF1)
- Centre for Biological Threats and Special Pathogens (ZBS1), Robert Koch Institute, Berlin, Germany
| | | | - Andreas Andrusch
- Bioinformatics Unit (MF1)
- Centre for Biological Threats and Special Pathogens (ZBS1), Robert Koch Institute, Berlin, Germany
| | | | | | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens (ZBS1), Robert Koch Institute, Berlin, Germany
| | | |
Collapse
|
44
|
Abstract
Motivation Current metagenomics approaches allow analyzing the composition of microbial communities at high resolution. Important changes to the composition are known to even occur on strain level and to go hand in hand with changes in disease or ecological state. However, specific challenges arise for strain level analysis due to highly similar genome sequences present. Only a limited number of tools approach taxa abundance estimation beyond species level and there is a strong need for dedicated tools for strain resolution and differential abundance testing. Methods We present DiTASiC (Differential Taxa Abundance including Similarity Correction) as a novel approach for quantification and differential assessment of individual taxa in metagenomics samples. We introduce a generalized linear model for the resolution of shared read counts which cause a significant bias on strain level. Further, we capture abundance estimation uncertainties, which play a crucial role in differential abundance analysis. A novel statistical framework is built, which integrates the abundance variance and infers abundance distributions for differential testing sensitive to strain level. Results As a result, we obtain highly accurate abundance estimates down to sub-strain level and enable fine-grained resolution of strain clusters. We demonstrate the relevance of read ambiguity resolution and integration of abundance uncertainties for differential analysis. Accurate detections of even small changes are achieved and false-positives are significantly reduced. Superior performance is shown on latest benchmark sets of various complexities and in comparison to existing methods. Availability and Implementation DiTASiC code is freely available from https://rki_bioinformatics.gitlab.io/ditasic. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martina Fischer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Benjamin Strauch
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
45
|
Karayel Ö, Şanal E, Giese SH, Üretmen Kagıalı ZC, Polat AN, Hu CK, Renard BY, Tuncbag N, Özlü N. Comparative phosphoproteomic analysis reveals signaling networks regulating monopolar and bipolar cytokinesis. Sci Rep 2018; 8:2269. [PMID: 29396449 PMCID: PMC5797227 DOI: 10.1038/s41598-018-20231-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 01/05/2018] [Indexed: 01/21/2023] Open
Abstract
The successful completion of cytokinesis requires the coordinated activities of diverse cellular components including membranes, cytoskeletal elements and chromosomes that together form partly redundant pathways, depending on the cell type. The biochemical analysis of this process is challenging due to its dynamic and rapid nature. Here, we systematically compared monopolar and bipolar cytokinesis and demonstrated that monopolar cytokinesis is a good surrogate for cytokinesis and it is a well-suited system for global biochemical analysis in mammalian cells. Based on this, we established a phosphoproteomic signature of cytokinesis. More than 10,000 phosphorylation sites were systematically monitored; around 800 of those were up-regulated during cytokinesis. Reconstructing the kinase-substrate interaction network revealed 31 potentially active kinases during cytokinesis. The kinase-substrate network connects proteins between cytoskeleton, membrane and cell cycle machinery. We also found consensus motifs of phosphorylation sites that can serve as biochemical markers specific to cytokinesis. Beyond the kinase-substrate network, our reconstructed signaling network suggests that combination of sumoylation and phosphorylation may regulate monopolar cytokinesis specific signaling pathways. Our analysis provides a systematic approach to the comparison of different cytokinesis types to reveal alternative ways and a global overview, in which conserved genes work together and organize chromatin and cytoplasm during cytokinesis.
Collapse
Affiliation(s)
- Özge Karayel
- Department of Molecular Biology and Genetics, Koç University, Istanbul, Turkey
| | - Erdem Şanal
- Department of Molecular Biology and Genetics, Koç University, Istanbul, Turkey
| | - Sven H Giese
- Bioinformatics Division (MF1), Robert Koch Institute, Berlin, Germany
- Chair of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
| | | | - Ayşe Nur Polat
- Department of Molecular Biology and Genetics, Koç University, Istanbul, Turkey
| | - Chi-Kuo Hu
- Department of Genetics, Stanford University, School of Medicine, CA, USA
| | - Bernhard Y Renard
- Bioinformatics Division (MF1), Robert Koch Institute, Berlin, Germany
| | - Nurcan Tuncbag
- Graduate School of Informatics, Department of Health Informatics, METU, Ankara, Turkey
- Cancer Systems Biology Laboratory (CanSyL), METU, Ankara, Turkey
| | - Nurhan Özlü
- Department of Molecular Biology and Genetics, Koç University, Istanbul, Turkey.
- Koç University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
46
|
Jandrasits C, Dabrowski PW, Fuchs S, Renard BY. seq-seq-pan: building a computational pan-genome data structure on whole genome alignment. BMC Genomics 2018; 19:47. [PMID: 29334898 PMCID: PMC5769345 DOI: 10.1186/s12864-017-4401-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 12/19/2017] [Indexed: 12/15/2022] Open
Abstract
Background The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. Results We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. Conclusions By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4401-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Stephan Fuchs
- Robert Koch Institute, Wernigerode Branch, Burgstraße 37, Wernigerode, 38855, Germany
| | | |
Collapse
|
47
|
Muth T, Kohrs F, Heyer R, Benndorf D, Rapp E, Reichl U, Martens L, Renard BY. MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go. Anal Chem 2017; 90:685-689. [PMID: 29215871 PMCID: PMC5757220 DOI: 10.1021/acs.analchem.7b03544] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
![]()
Metaproteomics,
the mass spectrometry-based analysis of proteins
from multispecies samples faces severe challenges concerning data
analysis and results interpretation. To overcome these shortcomings,
we here introduce the MetaProteomeAnalyzer (MPA) Portable software.
In contrast to the original server-based MPA application, this newly
developed tool no longer requires computational expertise for installation
and is now independent of any relational database system. In addition,
MPA Portable now supports state-of-the-art database search engines
and a convenient command line interface for high-performance data
processing tasks. While search engine results can easily be combined
to increase the protein identification yield, an additional two-step
workflow is implemented to provide sufficient analysis resolution
for further postprocessing steps, such as protein grouping as well
as taxonomic and functional annotation. Our new application has been
developed with a focus on intuitive usability, adherence to data standards,
and adaptation to Web-based workflow platforms. The open source software
package can be found at https://github.com/compomics/meta-proteome-analyzer.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute , 13353 Berlin, Germany
| | - Fabian Kohrs
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany
| | - Robert Heyer
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Erdmann Rapp
- Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Udo Reichl
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Lennart Martens
- Department of Biochemistry, Ghent University , 9000 Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB , 9000 Ghent, Belgium
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute , 13353 Berlin, Germany
| |
Collapse
|
48
|
Lindner MS, Strauch B, Schulze JM, Tausch SH, Dabrowski PW, Nitsche A, Renard BY. HiLive: real-time mapping of illumina reads while sequencing. Bioinformatics 2017; 33:917-319. [PMID: 27794555 DOI: 10.1093/bioinformatics/btw659] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Accepted: 10/19/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Next Generation Sequencing is increasingly used in time critical, clinical applications. While read mapping algorithms have always been optimized for speed, they follow a sequential paradigm and only start after finishing of the sequencing run and conversion of files. Since Illumina machines write intermediate output results, HiLive performs read mapping while still sequencing and thereby drastically reduces crucial overall sample analysis time, e.g. in precision medicine. Methods We present HiLive as a novel real time read mapper that implements a k-mer based alignment strategy. HiLive continuously reads intermediate BCL files produced by Illumina sequencers and then extends initial k-mer matches by increasingly produced data from the sequencer. Results We applied HiLive on real human transcriptome data to show that final read alignments are reported within few minutes after the end of a full Illumina HiSeq 1500 run, while already the necessary conversion to FASTQ files as the standard input to current read mapping methods takes roughly five times as long. Further, we show on simulated and real data that HiLive has comparable accuracy to recent read mappers. Availability and Implementation HiLive and its source code are freely available from https://gitlab.com/SimonHTausch/HiLive . Contact renardB@rki.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin S Lindner
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany
| | - Benjamin Strauch
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany
| | - Jakob M Schulze
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany
| | - Simon H Tausch
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany.,Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Piotr W Dabrowski
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany.,Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG 4), Robert Koch Institute, Berlin, Germany
| |
Collapse
|
49
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1101/099127] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 05/25/2023]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Department of Pediatrics, University of California, San Diego, California, USA
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
- Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France
- National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
- Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
- Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
- Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|
50
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1038/nmeth.4458] [Citation(s) in RCA: 430] [Impact Index Per Article: 61.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 12/12/2022]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Department of Pediatrics, University of California, San Diego, California, USA.,Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark.,Department of Microbiology, University of Copenhagen, Copenhagen, Denmark.,Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France.,National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France.,Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA.,Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.,Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|