51
|
Meyer P, Saez-Rodriguez J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst 2021; 12:636-653. [PMID: 34139170 DOI: 10.1016/j.cels.2021.05.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/29/2021] [Accepted: 05/18/2021] [Indexed: 02/07/2023]
Abstract
Computational and mathematical models are key to obtain a system-level understanding of biological processes, but their limitations have to be clearly defined to allow their proper application and interpretation. Crowdsourced benchmarks in the form of challenges provide an unbiased assessment of methods, and for the past decade, the Dialogue for Reverse Engineering Assessment and Methods (DREAM) organized more than 15 systems biology challenges. From transcription factor binding to dynamical network models, from signaling networks to gene regulation, from whole-cell models to cell-lineage reconstruction, and from single-cell positioning in a tissue to drug combinations and cell survival, the breadth is broad. To celebrate the 5-year anniversary of Cell Systems, we review the genesis of these systems biology challenges and discuss how interlocking the forward- and reverse-modeling paradigms allows to push the rim of systems biology. This approach will persist for systems levels approaches in biology and medicine.
Collapse
Affiliation(s)
- Pablo Meyer
- IBM T.J. Watson Research Center, Yorktown Heights, NY, USA.
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Heidelberg University Hospital and Heidelberg University, Faculty of Medicine, Bioquant, Heidelberg 69120, Germany
| |
Collapse
|
52
|
Buchka S, Hapfelmeier A, Gardner PP, Wilson R, Boulesteix AL. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol 2021; 22:152. [PMID: 33975646 PMCID: PMC8111726 DOI: 10.1186/s13059-021-02365-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/23/2021] [Indexed: 12/03/2022] Open
Abstract
Most research articles presenting new data analysis methods claim that "the new method performs better than existing methods," but the veracity of such statements is questionable. Our manuscript discusses and illustrates consequences of the optimistic bias occurring during the evaluation of novel data analysis methods, that is, all biases resulting from, for example, selection of datasets or competing methods, better ability to fix bugs in a preferred method, and selective reporting of method variants. We quantitatively investigate this bias using an example from epigenetic analysis: normalization methods for data generated by the Illumina HumanMethylation450K BeadChip microarray.
Collapse
Affiliation(s)
- Stefan Buchka
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| | - Alexander Hapfelmeier
- Institute of Medical Informatics, Statistics and Epidemiology, School of Medicine, TUM, Munich, Germany
- Institute of General Practice and Health Services Research, School of Medicine, TUM, Munich, Germany
| | - Paul P. Gardner
- Department of Biochemistry, University of Otago, Otago, New Zealand
| | - Rory Wilson
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| |
Collapse
|
53
|
James SA, Yam WK. Sub-structure-based screening and molecular docking studies of potential enteroviruses inhibitors. Comput Biol Chem 2021; 92:107499. [PMID: 33932782 DOI: 10.1016/j.compbiolchem.2021.107499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 04/21/2021] [Indexed: 11/15/2022]
Abstract
Rhinoviruses (RV), especially Human rhinovirus (HRVs) have been accepted as the most common cause for upper respiratory tract infections (URTIs). Pleconaril, a broad spectrum anti-rhinoviral compound, has been used as a drug of choice for URTIs for over a decade. Unfortunately, for various complications associated with this drug, it was rejected, and a replacement is highly desirable. In silico screening and prediction methods such as sub-structure search and molecular docking have been widely used to identify alternative compounds. In our study, we have utilised sub-structure search to narrow down our quest in finding relevant chemical compounds. Molecular docking studies were then used to study their binding interaction at the molecular level. Interestingly, we have identified 3 residues that is worth further investigation in upcoming molecular dynamics simulation systems of their contribution in stable interaction.
Collapse
Affiliation(s)
- Stephen Among James
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Selangor Darul Ehsan, Malaysia; Department of Biochemistry, Faculty of Science, Kaduna State University, 800211, Kaduna, Nigeria.
| | - Wai Keat Yam
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Selangor Darul Ehsan, Malaysia.
| |
Collapse
|
54
|
Kumar R, Sharma V, Suresh S, Ramrao DP, Veershetty A, Kumar S, Priscilla K, Hangargi B, Narasanna R, Pandey MK, Naik GR, Thomas S, Kumar A. Understanding Omics Driven Plant Improvement and de novo Crop Domestication: Some Examples. Front Genet 2021; 12:637141. [PMID: 33889179 PMCID: PMC8055929 DOI: 10.3389/fgene.2021.637141] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/02/2021] [Indexed: 01/07/2023] Open
Abstract
In the current era, one of biggest challenges is to shorten the breeding cycle for rapid generation of a new crop variety having high yield capacity, disease resistance, high nutrient content, etc. Advances in the "-omics" technology have revolutionized the discovery of genes and bio-molecules with remarkable precision, resulting in significant development of plant-focused metabolic databases and resources. Metabolomics has been widely used in several model plants and crop species to examine metabolic drift and changes in metabolic composition during various developmental stages and in response to stimuli. Over the last few decades, these efforts have resulted in a significantly improved understanding of the metabolic pathways of plants through identification of several unknown intermediates. This has assisted in developing several new metabolically engineered important crops with desirable agronomic traits, and has facilitated the de novo domestication of new crops for sustainable agriculture and food security. In this review, we discuss how "omics" technologies, particularly metabolomics, has enhanced our understanding of important traits and allowed speedy domestication of novel crop plants.
Collapse
Affiliation(s)
- Rakesh Kumar
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Vinay Sharma
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | - Srinivas Suresh
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | | | - Akash Veershetty
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Sharan Kumar
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Kagolla Priscilla
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | | | - Rahul Narasanna
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Manish Kumar Pandey
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | | | - Sherinmol Thomas
- Department of Biosciences & Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Anirudh Kumar
- Department of Botany, Indira Gandhi National Tribal University, Amarkantak, India
| |
Collapse
|
55
|
Meyer F, Lesker TR, Koslicki D, Fritz A, Gurevich A, Darling AE, Sczyrba A, Bremges A, McHardy AC. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 2021; 16:1785-1801. [PMID: 33649565 DOI: 10.1038/s41596-020-00480-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 11/26/2020] [Indexed: 01/31/2023]
Abstract
Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till-Robin Lesker
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - David Koslicki
- Computer Science and Engineering, Biology, and The Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, Australia
| | - Alexander Sczyrba
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
| |
Collapse
|
56
|
Chelysheva I, Pollard AJ, O'Connor D. RNA2HLA: HLA-based quality control of RNA-seq datasets. Brief Bioinform 2021; 22:6184409. [PMID: 33758920 PMCID: PMC8425422 DOI: 10.1093/bib/bbab055] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 01/16/2021] [Accepted: 02/02/2021] [Indexed: 12/19/2022] Open
Abstract
RNA-sequencing (RNA-seq) is a widely used approach for accessing the transcriptome in biomedical research. Studies frequently include multiple samples taken from the same individual at various time points or under different conditions, correct assignment of those samples to each particular participant is evidently of great importance. Here, we propose taking advantage of typing the highly polymorphic genes from the human leukocyte antigen (HLA) complex in order to verify the correct allocation of RNA-seq samples to individuals. We introduce RNA2HLA, a novel quality control (QC) tool for performing study-wide HLA-typing for RNA-seq data and thereby identifying the samples from the common source. RNA2HLA allows precise allocation and grouping of RNA samples based on their HLA types. Strikingly, RNA2HLA revealed wrongly assigned samples from publicly available datasets and thereby demonstrated the importance of this tool for the quality control of RNA-seq studies. In addition, our tool successfully extracts HLA alleles in four-digital resolution and can be used to perform massive HLA-typing from RNA-seq based studies, which will serve multiple research purposes beyond sample QC.
Collapse
Affiliation(s)
- Irina Chelysheva
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Andrew J Pollard
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Daniel O'Connor
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, UK
| |
Collapse
|
57
|
Abstract
Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.
Collapse
Affiliation(s)
- David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.,Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.,Galicia Sur Health Research Institute, Vigo, Spain
| |
Collapse
|
58
|
Haferlach T. The time has come for next-generation sequencing in routine diagnostic workup in hematology. Haematologica 2021; 106:659-661. [PMID: 33645944 PMCID: PMC7927880 DOI: 10.3324/haematol.2020.270504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Indexed: 01/24/2023] Open
|
59
|
Schmeing S, Robinson MD. ReSeq simulates realistic Illumina high-throughput sequencing data. Genome Biol 2021; 22:67. [PMID: 33608040 PMCID: PMC7896392 DOI: 10.1186/s13059-021-02265-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 01/07/2021] [Indexed: 12/18/2022] Open
Abstract
In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq.
Collapse
Affiliation(s)
- Stephan Schmeing
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, Winterthurerstrasse 190, Zurich, 8057, Switzerland.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, Winterthurerstrasse 190, Zurich, 8057, Switzerland.
| |
Collapse
|
60
|
Petrillo M, Fabbri M, Kagkli DM, Querci M, Van den Eede G, Alm E, Aytan-Aktug D, Capella-Gutierrez S, Carrillo C, Cestaro A, Chan KG, Coque T, Endrullat C, Gut I, Hammer P, Kay GL, Madec JY, Mather AE, McHardy AC, Naas T, Paracchini V, Peter S, Pightling A, Raffael B, Rossen J, Ruppé E, Schlaberg R, Vanneste K, Weber LM, Westh H, Angers-Loustau A. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing. F1000Res 2021; 10:80. [PMID: 35847383 PMCID: PMC9243550 DOI: 10.12688/f1000research.39214.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/10/2022] [Indexed: 10/31/2024] Open
Abstract
Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain "live" (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines' implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.
Collapse
Affiliation(s)
| | - Marco Fabbri
- European Commission Joint Research Centre, Ispra, Italy
| | | | | | - Guy Van den Eede
- European Commission Joint Research Centre, Ispra, Italy
- European Commission Joint Research Centre, Geel, Belgium
| | - Erik Alm
- The European Centre for Disease Prevention and Control, Stockholm, Sweden
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | | | - Catherine Carrillo
- Ottawa Laboratory – Carling, Canadian Food Inspection Agency, Ottawa, Ontario, Canada
| | | | - Kok-Gan Chan
- International Genome Centre, Jiangsu University, Zhenjiang, China
- Division of Genetics and Molecular Biology, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Teresa Coque
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Carlos III Health Institute, Madrid, Spain
| | | | - Ivo Gut
- Centro Nacional de Análisis Genómico, Centre for Genomic Regulation (CNAG-CRG), Barcelona Institute of Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Paul Hammer
- BIOMES. NGS GmbH c/o Technische Hochschule Wildau, Wildau, Germany
| | - Gemma L. Kay
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Jean-Yves Madec
- Unité Antibiorésistance et Virulence Bactériennes, ANSES Site de Lyon, Lyon, France
| | - Alison E. Mather
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- University of East Anglia, Norwich, UK
| | | | - Thierry Naas
- French-NRC for CPEs, Service de Bactériologie-Hygiène, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
| | | | - Silke Peter
- Institute of Medical Microbiology and Hygiene, University of Tübingen, Tübingen, Germany
| | - Arthur Pightling
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | | | - John Rossen
- Department of Medical Microbiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | | - Robert Schlaberg
- Department of Pathology, University of Utah, Salt Lake City, UT, USA
| | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Lukas M. Weber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Present address: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | |
Collapse
|
61
|
Rampler E, Abiead YE, Schoeny H, Rusz M, Hildebrand F, Fitz V, Koellensperger G. Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics-Standardization, Coverage, and Throughput. Anal Chem 2021; 93:519-545. [PMID: 33249827 PMCID: PMC7807424 DOI: 10.1021/acs.analchem.0c04698] [Citation(s) in RCA: 98] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Evelyn Rampler
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Vienna Metabolomics Center (VIME), University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| | - Yasin El Abiead
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Harald Schoeny
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Mate Rusz
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Institute of Inorganic
Chemistry, University of Vienna, Währinger Straße 42, 1090 Vienna, Austria
| | - Felina Hildebrand
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Veronika Fitz
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Gunda Koellensperger
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Vienna Metabolomics Center (VIME), University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| |
Collapse
|
62
|
Talikka M, Belcastro V, Boué S, Marescotti D, Hoeng J, Peitsch MC. Applying Systems Toxicology Methods to Drug Safety. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11522-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
63
|
Subkhankulova T, Naumenko F, Tolmachov OE, Orlov YL. Novel ChIP-seq simulating program with superior versatility: isChIP. Brief Bioinform 2020; 22:6035271. [PMID: 33320934 DOI: 10.1093/bib/bbaa352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 10/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is recognized as an extremely powerful tool to study the interaction of numerous transcription factors and other chromatin-associated proteins with DNA. The core problem in the optimization of ChIP-seq protocol and the following computational data analysis is that a 'true' pattern of binding events for a given protein factor is unknown. Computer simulation of the ChIP-seq process based on 'a-priory known binding template' can contribute to a drastically reduce the number of wet lab experiments and finally help achieve radical optimization of the entire processing pipeline. We present a newly developed ChIP-sequencing simulation algorithm implemented in the novel software, in silico ChIP-seq (isChIP). We demonstrate that isChIP closely approximates real ChIP-seq protocols and is able to model data similar to those obtained from experimental sequencing. We validated isChIP using publicly available datasets generated for well-characterized transcription factors Oct4 and Sox2. Although the novel software is compatible with the Illumina protocols by default, it can also successfully perform simulations with a number of alternative sequencing platforms such as Roche454, Ion Torrent and SOLiD as well as model ChIP -Exo. The versatility of isChIP was demonstrated through modelling a wide range of binding events, including those of transcription factors and chromatin modifiers. We also performed a comparative analysis against a few existing ChIP-seq simulators and showed the fundamental superiority of our model. Due to its ability to utilize known binding templates, isChIP can potentially be employed to help investigators choose the most appropriate analytical software through benchmarking of available ChIP-seq programs and optimize the experimental parameters of ChIP-seq protocol. isChIP software is freely available at https://github.com/fnaumenko/isChIP.
Collapse
Affiliation(s)
| | | | | | - Yuriy L Orlov
- Digital Health Institute, I.M. Sechenov First Moscow State Medical University (Sechenov University), and Senior Scientist at Agrarian and Technological Institute, Peoples' Friendship University of Russia (RUDN University), Russia
| |
Collapse
|
64
|
Bebris K, Polaka I. An Overview of the Application of Deep Learning in Short-Read Sequence Classification. INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE 2020. [DOI: 10.7250/itms-2020-0005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Advances in sequencing technology have led to an ever increasing amount of available short-read sequencing data. This has, consequently, exacerbated the need for efficient and precise classification tools that can be used in the analysis of these data. As it stands, recent years have shown that massive leaps in performance can be achieved when it comes to approaches that are based on heuristics, and apart from these improvements there has been an ever increasing interest in applying deep learning techniques to revolutionize this classification task. We attempt to study these approaches and to evaluate their performance in a reproducible fashion to get a better perspective on the current state of deep learning based methods when it comes to the classification of short-read sequencing data
Collapse
|
65
|
Fonseka P, Pathan M, Chitti SV, Kang T, Mathivanan S. FunRich enables enrichment analysis of OMICs datasets. J Mol Biol 2020; 433:166747. [PMID: 33310018 DOI: 10.1016/j.jmb.2020.166747] [Citation(s) in RCA: 167] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 01/17/2023]
Abstract
High-throughput methods to profile the genome, transcriptome, proteome and metabolome of various systems has become a routine in multiple research laboratories around the world. Hence, to analyse and interpret these heterogenous datasets user-friendly bioinformatics tools are needed. Here, we discuss FunRich tool that enables biologists to perform functional enrichment analysis on the generated datasets. Users can perform enrichment analysis with a variety of background databases and have complete control in updating or modifying the content in most of the databases. Specifically, users can download and update the background database from UniProt at any time thereby allowing a robust background database that can support annotations from >18 taxonomies. Users can create customizable Venn diagrams, pie charts, bar graphs and heatmaps of publication quality for their datasets using FunRich (http://www.funrich.org). Overall, FunRich tool is user-friendly and enables users to perform various analysis on their datasets with minimal or no aid from bioinformaticians.
Collapse
Affiliation(s)
- Pamali Fonseka
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Mohashin Pathan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Sai V Chitti
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Taeyoung Kang
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Suresh Mathivanan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia.
| |
Collapse
|
66
|
Krassowski M, Das V, Sahu SK, Misra BB. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front Genet 2020; 11:610798. [PMID: 33362867 PMCID: PMC7758509 DOI: 10.3389/fgene.2020.610798] [Citation(s) in RCA: 177] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 11/20/2020] [Indexed: 12/24/2022] Open
Abstract
Multi-omics, variously called integrated omics, pan-omics, and trans-omics, aims to combine two or more omics data sets to aid in data analysis, visualization and interpretation to determine the mechanism of a biological process. Multi-omics efforts have taken center stage in biomedical research leading to the development of new insights into biological events and processes. However, the mushrooming of a myriad of tools, datasets, and approaches tends to inundate the literature and overwhelm researchers new to the field. The aims of this review are to provide an overview of the current state of the field, inform on available reliable resources, discuss the application of statistics and machine/deep learning in multi-omics analyses, discuss findable, accessible, interoperable, reusable (FAIR) research, and point to best practices in benchmarking. Thus, we provide guidance to interested users of the domain by addressing challenges of the underlying biology, giving an overview of the available toolset, addressing common pitfalls, and acknowledging current methods' limitations. We conclude with practical advice and recommendations on software engineering and reproducibility practices to share a comprehensive awareness with new researchers in multi-omics for end-to-end workflow.
Collapse
Affiliation(s)
- Michal Krassowski
- Nuffield Department of Women’s & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc, Seattle, WA, United States
| | | | | |
Collapse
|
67
|
Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020; 18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
Collapse
Affiliation(s)
- Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michael S. Robeson
- University of Arkansas for Medical Sciences, Department of Biomedical Informatics, Little Rock, AR, USA
| | | |
Collapse
|
68
|
MacNamara A, Nakic N, Amin Al Olama A, Guo C, Sieber KB, Hurle MR, Gutteridge A. Network and pathway expansion of genetic disease associations identifies successful drug targets. Sci Rep 2020; 10:20970. [PMID: 33262371 PMCID: PMC7708424 DOI: 10.1038/s41598-020-77847-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 11/06/2020] [Indexed: 11/24/2022] Open
Abstract
Genetic evidence of disease association has often been used as a basis for selecting of drug targets for complex common diseases. Likewise, the propagation of genetic evidence through gene or protein interaction networks has been shown to accurately infer novel disease associations at genes for which no direct genetic evidence can be observed. However, an empirical test of the utility of combining these approaches for drug discovery has been lacking. In this study, we examine genetic associations arising from an analysis of 648 UK Biobank GWAS and evaluate whether targets identified as proxies of direct genetic hits are enriched for successful drug targets, as measured by historical clinical trial data. We find that protein networks formed from specific functional linkages such as protein complexes and ligand–receptor pairs are suitable for even naïve guilt-by-association network propagation approaches. In addition, more sophisticated approaches applied to global protein–protein interaction networks and pathway databases, also successfully retrieve targets enriched for clinically successful drug targets. We conclude that network propagation of genetic evidence can be used for drug target identification.
Collapse
Affiliation(s)
| | | | | | - Cong Guo
- Human Genetics, GSK, Collegeville, PA, USA
| | | | | | | |
Collapse
|
69
|
Abstract
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
Collapse
|
70
|
Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep 2020; 10:20222. [PMID: 33214604 PMCID: PMC7678823 DOI: 10.1038/s41598-020-77218-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open
Abstract
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Sen Zhao
- Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, 0310, Oslo, Norway
| | | | - Abdulrahman Azab
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway
- Division of Research Computing, University Center for Information Technology (USIT), University of Oslo, 0316, Oslo, Norway
| | - Tomasz Stokowy
- Computational Biology Unit, Institute of Informatics, University of Bergen, 5008, Bergen, Norway
- Department of Clinical Science, University of Bergen, 5021, Bergen, Norway
| | - Eivind Hovig
- Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, 0310, Oslo, Norway.
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway.
| |
Collapse
|
71
|
Knoll M, Furkel J, Debus J, Abdollahi A, Karch A, Stock C. An R package for an integrated evaluation of statistical approaches to cancer incidence projection. BMC Med Res Methodol 2020; 20:257. [PMID: 33059585 PMCID: PMC7559591 DOI: 10.1186/s12874-020-01133-5] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/24/2020] [Indexed: 11/10/2022] Open
Abstract
Background Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. Age-Period-Cohort (APC) models, usually based on long-term cancer registry data (> 20 yrs), are established for such projections. In many countries (including Germany), however, nationwide long-term data are not yet available. General guidance on statistical approaches for projections using rather short-term data is challenging and software to enable researchers to easily compare approaches is lacking. Methods To enable a comparative analysis of the performance of statistical approaches to cancer incidence projection, we developed an R package (incAnalysis), supporting in particular Bayesian models fitted by Integrated Nested Laplace Approximations (INLA). Its use is demonstrated by an extensive empirical evaluation of operating characteristics (bias, coverage and precision) of potentially applicable models differing by complexity. Observed long-term data from three cancer registries (SEER-9, NORDCAN, Saarland) was used for benchmarking. Results Overall, coverage was high (mostly > 90%) for Bayesian APC models (BAPC), whereas less complex models showed differences in coverage dependent on projection-period. Intercept-only models yielded values below 20% for coverage. Bias increased and precision decreased for longer projection periods (> 15 years) for all except intercept-only models. Precision was lowest for complex models such as BAPC models, generalized additive models with multivariate smoothers and generalized linear models with age x period interaction effects. Conclusion The incAnalysis R package allows a straightforward comparison of cancer incidence rate projection approaches. Further detailed and targeted investigations into model performance in addition to the presented empirical results are recommended to derive guidance on appropriate statistical projection methods in a given setting.
Collapse
Affiliation(s)
- Maximilian Knoll
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany. .,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany. .,Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany. .,German Cancer Consortium (DKTK) Core Center Heidelberg, Heidelberg, Germany.
| | - Jennifer Furkel
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany.,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.,Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.,German Cancer Consortium (DKTK) Core Center Heidelberg, Heidelberg, Germany
| | - Jürgen Debus
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany.,Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.,German Cancer Consortium (DKTK) Core Center Heidelberg, Heidelberg, Germany
| | - Amir Abdollahi
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany.,Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.,German Cancer Consortium (DKTK) Core Center Heidelberg, Heidelberg, Germany
| | - André Karch
- Institute of Epidemiology and Social Medicine, University of Muenster, Albert-Schweitzer-Campus 1, 48149, Muenster, Germany
| | - Christian Stock
- Institute of Medical Biometry and Informatics (IMBI), University of Heidelberg, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany.,Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
72
|
LaPierre N, Alser M, Eskin E, Koslicki D, Mangul S. Metalign: efficient alignment-based metagenomic profiling via containment min hash. Genome Biol 2020; 21:242. [PMID: 32912225 PMCID: PMC7488264 DOI: 10.1186/s13059-020-02159-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 08/26/2020] [Indexed: 12/31/2022] Open
Abstract
Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA.
| | - Mohammed Alser
- Department of Computer Science, ETH Zurich, Rämistrasse 101, CH-8092, Zurich, Switzerland
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park,, PA, USA.
| | - Serghei Mangul
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
73
|
Seppey M, Manni M, Zdobnov EM. LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res 2020; 30:1208-1216. [PMID: 32616517 PMCID: PMC7462069 DOI: 10.1101/gr.260398.119] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/25/2020] [Indexed: 11/24/2022]
Abstract
Studies of microbiomes are booming, along with the diversity of computational approaches to make sense out of the sequencing data and the volumes of accumulated microbial genotypes. A swift evaluation of newly published methods and their improvements against established tools is necessary to reduce the time between the methods' release and their adoption in microbiome analyses. The LEMMI platform offers a novel approach for benchmarking software dedicated to metagenome composition assessments based on read classification. It enables the integration of newly published methods in an independent and centralized benchmark designed to be continuously open to new submissions. This allows developers to be proactive regarding comparative evaluations and guarantees that any promising methods can be assessed side by side with established tools quickly after their release. Moreover, LEMMI enforces an effective distribution through software containers to ensure long-term availability of all methods. Here, we detail the LEMMI workflow and discuss the performances of some previously unevaluated tools. We see this platform eventually as a community-driven effort in which method developers can showcase novel approaches and get unbiased benchmarks for publications, and users can make informed choices and obtain standardized and easy-to-use tools.
Collapse
Affiliation(s)
- Mathieu Seppey
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| |
Collapse
|
74
|
Lamprecht AL, Garcia L, Kuzak M, Martinez C, Arcila R, Martin Del Pico E, Dominguez Del Angel V, van de Sandt S, Ison J, Martinez PA, McQuilton P, Valencia A, Harrow J, Psomopoulos F, Gelpi JL, Chue Hong N, Goble C, Capella-Gutierrez S. Towards FAIR principles for research software. ACTA ACUST UNITED AC 2020. [DOI: 10.3233/ds-190026] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Leyla Garcia
- ZBMED Information Centre for Life Sciences, Germany. E-mail:
| | - Mateusz Kuzak
- Netherlands eScience Center, The Netherlands
- Dutch Techcentre for Life Sciences, The Netherlands. E-mail:
| | | | | | | | | | | | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Denmark. E-mail:
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain. E-mail:
| | | | | | - Josep Ll. Gelpi
- Barcelona Supercomputing Center (BSC), Spain
- University of Barcelona, Spain. E-mail:
| | - Neil Chue Hong
- Software Sustainability Institute, UK
- EPCC, University of Edinburgh, UK. E-mail:
| | | | | |
Collapse
|
75
|
Gao Z, Ding R, Zhai X, Wang Y, Chen Y, Yang CX, Du ZQ. Common Gene Modules Identified for Chicken Adiposity by Network Construction and Comparison. Front Genet 2020; 11:537. [PMID: 32547600 PMCID: PMC7272656 DOI: 10.3389/fgene.2020.00537] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 05/04/2020] [Indexed: 12/12/2022] Open
Abstract
Excessive fat deposition can cause chicken health problem, and affect production efficiency by causing great economic losses to the industry. However, the molecular underpinnings of the complex adiposity trait remain elusive. In the current study, we constructed and compared the gene co-expression networks on four transcriptome profiling datasets, from two chicken lines under divergent selection for abdominal fat contents, in an attempt to dissect network compositions underlying adipose tissue growth and development. After functional enrichment analysis, nine network modules important to adipogenesis were discovered to be involved in lipid metabolism, PPAR and insulin signaling pathways, and contained hub genes related to adipogenesis, cell cycle, inflammation, and protein synthesis. Moreover, after additional functional annotation and network module comparisons, common sub-modules of similar functionality for chicken fat deposition were identified for different chicken lines, apart from modules specific to each chicken line. We further validated the lysosome pathway, and found TFEB and its downstream target genes showed similar expression patterns along with chicken preadipocyte differentiation. Our findings could provide novel insights into the genetic basis of complex adiposity traits, as well as human obesity and related metabolic diseases.
Collapse
Affiliation(s)
- Zhuoran Gao
- College of Animal Science, Yangtze University, Jingzhou, China.,College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Ran Ding
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Xiangyun Zhai
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Yuhao Wang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Yaofeng Chen
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Cai-Xia Yang
- College of Animal Science, Yangtze University, Jingzhou, China.,College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Zhi-Qiang Du
- College of Animal Science, Yangtze University, Jingzhou, China.,College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| |
Collapse
|
76
|
Carr VR, Shkoporov A, Hill C, Mullany P, Moyes DL. Probing the Mobilome: Discoveries in the Dynamic Microbiome. Trends Microbiol 2020; 29:158-170. [PMID: 32448763 DOI: 10.1016/j.tim.2020.05.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/30/2020] [Accepted: 05/05/2020] [Indexed: 02/06/2023]
Abstract
There has been an explosion of metagenomic data representing human, animal, and environmental microbiomes. This provides an unprecedented opportunity for comparative and longitudinal studies of many functional aspects of the microbiome that go beyond taxonomic classification, such as profiling genetic determinants of antimicrobial resistance, interactions with the host, potentially clinically relevant functions, and the role of mobile genetic elements (MGEs). One of the most important but least studied of these aspects are the MGEs, collectively referred to as the 'mobilome'. Here we elaborate on the benefits and limitations of using different metagenomic protocols, discuss the relative merits of various sequencing technologies, and highlight relevant bioinformatics tools and pipelines to predict the presence of MGEs and their microbial hosts.
Collapse
Affiliation(s)
- Victoria R Carr
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral and Craniofacial Sciences, King's College London, London, UK; The Alan Turing Institute, British Library, London, UK.
| | - Andrey Shkoporov
- APC Microbiome Ireland, School of Microbiology, University College Cork, Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, School of Microbiology, University College Cork, Cork, Ireland
| | - Peter Mullany
- Eastman Dental Institute, University College London, London, UK
| | - David L Moyes
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral and Craniofacial Sciences, King's College London, London, UK.
| |
Collapse
|
77
|
Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020; 21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Jaqueline J Brito
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Igor Mandric
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Qiaozhen Wu
- Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Lana S Martin
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Ekaterina Gerasimov
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Russell Littman
- UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Kevin Hsieh
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Linus Chen
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Eli Littman
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Taylor Shabani
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - German Enik
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Douglas Yao
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Jan Schroeder
- Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
- The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
| |
Collapse
|
78
|
Browne P, Van Der Wal L, Gourmelon A. OECD approaches and considerations for regulatory evaluation of endocrine disruptors. Mol Cell Endocrinol 2020; 504:110675. [PMID: 31830512 DOI: 10.1016/j.mce.2019.110675] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 11/20/2019] [Accepted: 12/02/2019] [Indexed: 12/18/2022]
Abstract
Identifying the potential endocrine disruptor hazard of environmental chemicals is a regulatory mandate for many countries. However, due to the adaptive nature of the endocrine system, absence of a single method capable of identifying endocrine disruption, and the latency between exposure to endocrine disrupting chemical during sensitive life stages and the manifestation of adverse responses, satisfying the regulatory requirement needed to identify a chemical as an endocrine disruptor is a challenge. There are now a variety of validated regulatory tests that can be used in combination to provide evidence that a chemical affects the oestrogen, androgen, thyroid, and steroidogenic pathways of vertebrates, but most rely (at least to some extent) on animal testing and require considerable cost and time to produce the necessary data. Emerging research methods are able to evaluate other endocrine pathways, incorporate more sensitive endpoints, and combine multiple alternative methods to predict in vivo outcomes. Some research approaches may also bridge gaps that have been identified in current endocrine regulatory testing. For the near term, considering new endpoints in a regulatory context may require adding them to existing test methods in order to establish relationships between the traditional and the innovative. From the outset, endocrine testing has always required integration of multiple methods that provide data on different levels of biological organisation, thus, the area of endocrine disruption is particularly adaptable to adverse outcome pathway (AOP) frameworks and integrated test methods built around AOPs. Herein, we provide a review of the status of endocrine disruptors in the OECD context, examples where innovation from research is needed to improve or bridge gaps in endocrine testing, and suggestions for regulators and researchers to facilitate uptake of innovate methods for endocrine disruptor regulatory testing. The increase in several human complex human disorders that include an endocrine component and the alarming decrease in wildlife biodiversity are commanding directives to include the best, most informative, innovative approaches to accelerate the rate and throughput of chemical evaluation for endocrine disruption.
Collapse
Affiliation(s)
- Patience Browne
- Organisation for Economic Cooperation and Development, Environment Directorate, Paris, France.
| | - Leon Van Der Wal
- Organisation for Economic Cooperation and Development, Environment Directorate, Paris, France
| | - Anne Gourmelon
- Organisation for Economic Cooperation and Development, Environment Directorate, Paris, France
| |
Collapse
|
79
|
|
80
|
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, Tausch SH, Malorny B. Typing methods based on whole genome sequencing data. ONE HEALTH OUTLOOK 2020; 2:3. [PMID: 33829127 PMCID: PMC7993478 DOI: 10.1186/s42522-020-0010-1] [Citation(s) in RCA: 115] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/08/2020] [Indexed: 05/12/2023]
Abstract
Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in the genome sequence of bacterial pathogens. In addition, its highly discriminative power enables the comparison of genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, and environment) for the investigation of disease outbreaks, source attribution, and improved risk characterization models. In order to extract relevant information from the large quantity and complex data produced by WGS, a host of bioinformatics tools has been developed, allowing users to analyze and interpret sequencing data, starting from simple gene-searches to complex phylogenetic studies. Depending on the research question, the complexity of the dataset and their bioinformatics skill set, users can choose between a great variety of tools for the analysis of WGS data. In this review, we describe the relevant approaches for phylogenomic studies for outbreak studies and give an overview of selected tools for the characterization of foodborne pathogens based on WGS data. Despite the efforts of the last years, harmonization and standardization of typing tools are still urgently needed to allow for an easy comparison of data between laboratories, moving towards a one health worldwide surveillance system for foodborne pathogens.
Collapse
Affiliation(s)
- Laura Uelze
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Josephine Grützke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Maria Borowiak
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Jens Andre Hammerl
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Katharina Juraschek
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Carlus Deneke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Simon H. Tausch
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Burkhard Malorny
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| |
Collapse
|
81
|
Rendueles O. Deciphering the role of the capsule of Klebsiella pneumoniae during pathogenesis: A cautionary tale. Mol Microbiol 2020; 113:883-888. [PMID: 31997409 PMCID: PMC7317218 DOI: 10.1111/mmi.14474] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 01/20/2020] [Accepted: 01/20/2020] [Indexed: 01/31/2023]
Abstract
Extracellular capsule polysaccharides increase the cellular fitness under abiotic stresses and during competition with other bacteria. They are best-known for their role in virulence, particularly in human hosts. Specifically, capsules facilitate tissue invasion by enhancing bacterial evasion from phagocytosis and protect cells from biocidal molecules. Klebsiella pneumoniae is a worrisome nosocomial pathogen with few known virulence factors, but the most important one is its capsule. In this issue, Tan et al. assess the fitness advantage of the capsule by competing a wild-type strain against four different mutants where capsule production is interrupted at different stages of the biosynthetic pathway. Strikingly, not all mutants provide a fitness advantage. They suggest that some mutants have secondary defects altering virulence-associated phenotypes and blurring the role of the capsule in pathogenesis. This study indicates that the K1 capsule in K. pneumoniae is not required for gut colonization but that it is critical for bloodstream dissemination to other organs. These results contribute to clarify the contradictory literature on the role of the Klebsiella capsule during infection. Finally, the varying fitness effects of different capsule mutations observed for K. pneumoniae K1 might apply also to other capsulated diderm bacteria that are facultative or emerging pathogens.
Collapse
Affiliation(s)
- Olaya Rendueles
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| |
Collapse
|
82
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 687] [Impact Index Per Article: 137.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
83
|
Kreutz C. Guidelines for benchmarking of optimization-based approaches for fitting mathematical models. Genome Biol 2019; 20:281. [PMID: 31842943 PMCID: PMC6915982 DOI: 10.1186/s13059-019-1887-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 11/13/2019] [Indexed: 11/10/2022] Open
Abstract
Insufficient performance of optimization-based approaches for the fitting of mathematical models is still a major bottleneck in systems biology. In this article, the reasons and methodological challenges are summarized as well as their impact in benchmark studies. Important aspects for achieving an increased level of evidence for benchmark results are discussed. Based on general guidelines for benchmarking in computational biology, a collection of tailored guidelines is presented for performing informative and unbiased benchmarking of optimization-based fitting approaches. Comprehensive benchmark studies based on these recommendations are urgently required for the establishment of a robust and reliable methodology for the systems biology community.
Collapse
Affiliation(s)
- Clemens Kreutz
- Faculty of Medicine and Medical Center, Institute of Medical Biometry and Statistics, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany.
- CIBSS-Centre for Integrative Biological Signalling Studies, University of Freiburg, Freiburg, Germany.
| |
Collapse
|
84
|
Zheng H, Brennan K, Hernaez M, Gevaert O. Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples. Gigascience 2019; 8:giz145. [PMID: 31808800 PMCID: PMC6897288 DOI: 10.1093/gigascience/giz145] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 09/30/2019] [Accepted: 11/15/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification. RESULTS In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods. CONCLUSIONS Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.
Collapse
Affiliation(s)
- Hong Zheng
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, 1265 Welch Road, Stanford, 94305, CA, USA
| | - Kevin Brennan
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, 1265 Welch Road, Stanford, 94305, CA, USA
| | - Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 W. Gregory Dr, Urbana, 61805, IL, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, 1265 Welch Road, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford, 94305, CA, USA
| |
Collapse
|
85
|
Robinson MD, Vitek O. Benchmarking comes of age. Genome Biol 2019; 20:205. [PMID: 31597556 PMCID: PMC6785869 DOI: 10.1186/s13059-019-1846-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 11/25/2022] Open
Affiliation(s)
- Mark D Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland.
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
86
|
Weber LM, Soneson C. HDCytoData: Collection of high-dimensional cytometry benchmark datasets in Bioconductor object formats. F1000Res 2019; 8:1459. [PMID: 31857895 PMCID: PMC6904983 DOI: 10.12688/f1000research.20210.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/09/2019] [Indexed: 06/16/2024] Open
Abstract
Benchmarking is a crucial step during computational analysis and method development. Recently, a number of new methods have been developed for analyzing high-dimensional cytometry data. However, it can be difficult for analysts and developers to find and access well-characterized benchmark datasets. Here, we present HDCytoData, a Bioconductor package providing streamlined access to several publicly available high-dimensional cytometry benchmark datasets. The package is designed to be extensible, allowing new datasets to be contributed by ourselves or other researchers in the future. Currently, the package includes a set of experimental and semi-simulated datasets, which have been used in our previous work to evaluate methods for clustering and differential analyses. Datasets are formatted into standard SummarizedExperiment and flowSet Bioconductor object formats, which include complete metadata within the objects. Access is provided through Bioconductor's ExperimentHub interface. The package is freely available from http://bioconductor.org/packages/HDCytoData.
Collapse
Affiliation(s)
- Lukas M. Weber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, 4058, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|
87
|
Weber LM, Soneson C. HDCytoData: Collection of high-dimensional cytometry benchmark datasets in Bioconductor object formats. F1000Res 2019; 8:1459. [PMID: 31857895 PMCID: PMC6904983 DOI: 10.12688/f1000research.20210.2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/09/2019] [Indexed: 12/04/2022] Open
Abstract
Benchmarking is a crucial step during computational analysis and method development. Recently, a number of new methods have been developed for analyzing high-dimensional cytometry data. However, it can be difficult for analysts and developers to find and access well-characterized benchmark datasets. Here, we present HDCytoData, a Bioconductor package providing streamlined access to several publicly available high-dimensional cytometry benchmark datasets. The package is designed to be extensible, allowing new datasets to be contributed by ourselves or other researchers in the future. Currently, the package includes a set of experimental and semi-simulated datasets, which have been used in our previous work to evaluate methods for clustering and differential analyses. Datasets are formatted into standard SummarizedExperiment and flowSet Bioconductor object formats, which include complete metadata within the objects. Access is provided through Bioconductor's ExperimentHub interface. The package is freely available from http://bioconductor.org/packages/HDCytoData.
Collapse
Affiliation(s)
- Lukas M Weber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|
88
|
Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, Boulesteix AL, Saeys Y, Robinson MD. Essential guidelines for computational method benchmarking. Genome Biol 2019; 20:125. [PMID: 31221194 PMCID: PMC6584985 DOI: 10.1186/s13059-019-1738-8] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.
Collapse
Affiliation(s)
- Lukas M Weber
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland
| | - Wouter Saelens
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium
| | - Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland
- Present address: Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Alexander Hapfelmeier
- Institute of Medical Informatics, Statistics and Epidemiology, Technical University of Munich, 81675, Munich, Germany
| | - Paul P Gardner
- Department of Biochemistry, University of Otago, Dunedin, 9016, New Zealand
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, 81377, Munich, Germany
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium.
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland.
| |
Collapse
|