1
|
Pan L, Mou T, Huang Y, Hong W, Yu M, Li X. Ursa: A Comprehensive Multiomics Toolbox for High-Throughput Single-Cell Analysis. Mol Biol Evol 2023; 40:msad267. [PMID: 38091963 PMCID: PMC10752348 DOI: 10.1093/molbev/msad267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/08/2023] [Accepted: 11/03/2023] [Indexed: 12/28/2023] Open
Abstract
The burgeoning amount of single-cell data has been accompanied by revolutionary changes to computational methods to map, quantify, and analyze the outputs of these cutting-edge technologies. Many are still unable to reap the benefits of these advancements due to the lack of bioinformatics expertise. To address this issue, we present Ursa, an automated single-cell multiomics R package containing 6 automated single-cell omics and spatial transcriptomics workflows. Ursa allows scientists to carry out post-quantification single or multiomics analyses in genomics, transcriptomics, epigenetics, proteomics, and immunomics at the single-cell level. It serves as a 1-stop analytic solution by providing users with outcomes to quality control assessments, multidimensional analyses such as dimension reduction and clustering, and extended analyses such as pseudotime trajectory and gene-set enrichment analyses. Ursa aims bridge the gap between those with bioinformatics expertise and those without by providing an easy-to-use bioinformatics package for scientists in hoping to accelerate their research potential. Ursa is freely available at https://github.com/singlecellomics/ursa.
Collapse
Affiliation(s)
- Lu Pan
- Institute of Environmental Medicine, Karolinska Institutet, Solna 171 65, Sweden
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna 171 65, Sweden
| | - Tian Mou
- School of Biomedical Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Yue Huang
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna 171 65, Sweden
| | - Weifeng Hong
- Department of Radiation Oncology, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Min Yu
- Department of General Surgery, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong 510515, China
| | - Xuexin Li
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Solna 171 65, Sweden
- Department of General Surgery, The Fourth Affiliated Hospital, China Medical University, Shenyang 110032, China
| |
Collapse
|
2
|
Laajala E, Halla-Aho V, Grönroos T, Kalim UU, Vähä-Mäkilä M, Nurmio M, Kallionpää H, Lietzén N, Mykkänen J, Rasool O, Toppari J, Orešič M, Knip M, Lund R, Lahesmaa R, Lähdesmäki H. Permutation-based significance analysis reduces the type 1 error rate in bisulfite sequencing data analysis of human umbilical cord blood samples. Epigenetics 2022; 17:1608-1627. [PMID: 35246015 DOI: 10.1080/15592294.2022.2044127] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.
Collapse
Affiliation(s)
- Essi Laajala
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland.,Turku Doctoral Programme of Molecular Medicine, University of Turku, Turku, Finland.,Department of Computer Science, Aalto University, Espoo, Finland
| | - Viivi Halla-Aho
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Toni Grönroos
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland
| | - Ubaid Ullah Kalim
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland
| | - Mari Vähä-Mäkilä
- Research Centre for Integrative Physiology and Pharmacology, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Mirja Nurmio
- Research Centre for Integrative Physiology and Pharmacology, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Henna Kallionpää
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Niina Lietzén
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Juha Mykkänen
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland.,Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
| | - Omid Rasool
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland
| | - Jorma Toppari
- Research Centre for Integrative Physiology and Pharmacology, Institute of Biomedicine, University of Turku, Turku, Finland.,Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland.,Department of Pediatrics, Turku University Hospital, Turku, Finland
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland.,School of Medical Sciences, Örebro University, Örebro, Sweden
| | - Mikael Knip
- Pediatric Research Center, Children's Hospital, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.,Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Center for Child Health Research, Tampere University Hospital, Tampere, Finland
| | - Riikka Lund
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Riitta Lahesmaa
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.,InFLAMES Research Flagship Center, University of Turku, Turku Finland.,Institute of Biomedicine, University of Turku, Turku, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
3
|
Cacciabue M, Currá A, Carrillo E, König G, Gismondi MI. A beginner's guide for FMDV quasispecies analysis: sub-consensus variant detection and haplotype reconstruction using next-generation sequencing. Brief Bioinform 2020; 21:1766-1775. [PMID: 31697321 PMCID: PMC7110011 DOI: 10.1093/bib/bbz086] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 12/18/2022] Open
Abstract
Deep sequencing of viral genomes is a powerful tool to study RNA virus complexity. However, the analysis of next-generation sequencing data might be challenging for researchers who have never approached the study of viral quasispecies by this methodology. In this work we present a suitable and affordable guide to explore the sub-consensus variability and to reconstruct viral quasispecies from Illumina sequencing data. The guide includes a complete analysis pipeline along with user-friendly descriptions of software and file formats. In addition, we assessed the feasibility of the workflow proposed by analyzing a set of foot-and-mouth disease viruses (FMDV) with different degrees of variability. This guide introduces the analysis of quasispecies of FMDV and other viruses through this kind of approach.
Collapse
Affiliation(s)
- Marco Cacciabue
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Anabella Currá
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Elisa Carrillo
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - Guido König
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - María Inés Gismondi
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| |
Collapse
|
4
|
Merino GA, Conesa A, Fernández EA. A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies. Brief Bioinform 2019; 20:471-481. [PMID: 29040385 DOI: 10.1093/bib/bbx122] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Revised: 08/20/2017] [Indexed: 12/16/2022] Open
Abstract
Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.
Collapse
Affiliation(s)
| | - Ana Conesa
- Microbiology and Cell Sciences Department of the University of Florida at Gainesville, FL, USA
| | | |
Collapse
|
5
|
Zehl L, Jaillet F, Stoewer A, Grewe J, Sobolev A, Wachtler T, Brochier TG, Riehle A, Denker M, Grün S. Handling Metadata in a Neurophysiology Laboratory. Front Neuroinform 2016; 10:26. [PMID: 27486397 PMCID: PMC4949266 DOI: 10.3389/fninf.2016.00026] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 06/27/2016] [Indexed: 01/25/2023] Open
Abstract
To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework.
Collapse
Affiliation(s)
- Lyuba Zehl
- Institute of Neuroscience and Medicine (INM-6), Institute for Advanced Simulation (IAS-6), JARA BRAIN Institute I, Jülich Research Centre Jülich, Germany
| | - Florent Jaillet
- Laboratoire d'informatique Fondamentale, UMR 7279, Centre National de la Recherche Scientifique, Aix-Marseille UniversitéMarseille, France; Institut de Neurosciences de la Timone, UMR 7289, Centre National de la Recherche Scientifique, Aix-Marseille UniversitéMarseille, France
| | - Adrian Stoewer
- Department of Biology II, Ludwig-Maximilians-Universität München Martinsried, Germany
| | - Jan Grewe
- Institut for Neurobiology, Abteilung Neuroethologie, Eberhard-Karls-Universität Tübingen Tübingen, Germany
| | - Andrey Sobolev
- Department of Biology II, Ludwig-Maximilians-Universität München Martinsried, Germany
| | - Thomas Wachtler
- Department of Biology II, Ludwig-Maximilians-Universität München Martinsried, Germany
| | - Thomas G Brochier
- Institut de Neurosciences de la Timone, UMR 7289, Centre National de la Recherche Scientifique, Aix-Marseille Université Marseille, France
| | - Alexa Riehle
- Institut de Neurosciences de la Timone, UMR 7289, Centre National de la Recherche Scientifique, Aix-Marseille UniversitéMarseille, France; Institute of Neuroscience and Medicine (INM-6), Jülich Research CentreJülich, Germany
| | - Michael Denker
- Institute of Neuroscience and Medicine (INM-6), Institute for Advanced Simulation (IAS-6), JARA BRAIN Institute I, Jülich Research Centre Jülich, Germany
| | - Sonja Grün
- Institute of Neuroscience and Medicine (INM-6), Institute for Advanced Simulation (IAS-6), JARA BRAIN Institute I, Jülich Research CentreJülich, Germany; Theoretical Systems Neurobiology, RWTH Aachen UniversityAachen, Germany
| |
Collapse
|
6
|
Klie S, Krueger S, Krall L, Giavalisco P, Flügge UI, Willmitzer L, Steinhauser D. Analysis of the compartmentalized metabolome - a validation of the non-aqueous fractionation technique. Front Plant Sci 2011; 2:55. [PMID: 22645541 PMCID: PMC3355776 DOI: 10.3389/fpls.2011.00055] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2011] [Accepted: 09/05/2011] [Indexed: 05/17/2023]
Abstract
With the development of high-throughput metabolic technologies, a plethora of primary and secondary compounds have been detected in the plant cell. However, there are still major gaps in our understanding of the plant metabolome. This is especially true with regards to the compartmental localization of these identified metabolites. Non-aqueous fractionation (NAF) is a powerful technique for the determination of subcellular metabolite distributions in eukaryotic cells, and it has become the method of choice to analyze the distribution of a large number of metabolites concurrently. However, the NAF technique produces a continuous gradient of metabolite distributions, not discrete assignments. Resolution of these distributions requires computational analyses based on marker molecules to resolve compartmental localizations. In this article we focus on expanding the computational analysis of data derived from NAF. Along with an experimental workflow, we describe the critical steps in NAF experiments and how computational approaches can aid in assessing the quality and robustness of the derived data. For this, we have developed and provide a new version (v1.2) of the BestFit command line tool for calculation and evaluation of subcellular metabolite distributions. Furthermore, using both simulated and experimental data we show the influence on estimated subcellular distributions by modulating important parameters, such as the number of fractions taken or which marker molecule is selected. Finally, we discuss caveats and benefits of NAF analysis in the context of the compartmentalized metabolome.
Collapse
Affiliation(s)
- Sebastian Klie
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant PhysiologyPotsdam-Golm, Germany
| | - Stephan Krueger
- Botanical Institute II, University of CologneCologne, Germany
| | - Leonard Krall
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant PhysiologyPotsdam-Golm, Germany
| | - Patrick Giavalisco
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant PhysiologyPotsdam-Golm, Germany
| | - Ulf-Ingo Flügge
- Botanical Institute II, University of CologneCologne, Germany
| | - Lothar Willmitzer
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant PhysiologyPotsdam-Golm, Germany
| | - Dirk Steinhauser
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant PhysiologyPotsdam-Golm, Germany
- *Correspondence: Dirk Steinhauser, Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany. e-mail:
| |
Collapse
|
7
|
Abstract
In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development.
Collapse
|