1
|
Eberth S, Koblitz J, Steenpaß L, Pommerenke C. Refined variant calling pipeline on RNA-seq data of breast cancer cell lines without matched-normal samples. BMC Res Notes 2025; 18:67. [PMID: 39955561 PMCID: PMC11829467 DOI: 10.1186/s13104-025-07140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 02/04/2025] [Indexed: 02/17/2025] Open
Abstract
OBJECTIVE RNA-seq delivers valuable insights both to transcriptional patterns and mutational landscapes for transcribed genes. However, as tumour cell lines frequently lack their matched-normal counterpart, variant calling without the paired normal sample is still challenging. In order to exclude variants of common genetic variation without a matched-normal control, filtering strategies need to be developed to identify tumour relevant variants in cell lines. RESULTS Here, variants of 29 breast cancer cell lines were called on RNA-seq data via HaplotypeCaller. Low read depth sites, RNA-edit sites, and low complexity regions in coding regions were excluded. Common variants were filtered using 1000 genomes, gnomAD, and dbSNP data. Starting from hundred thousands of single nucleotide variants and small insertions and deletions, about thousand variants remained after filtering for each sample. Extracted variants were validated against the Catalogue of Somatic Mutations in Cancer (COSMIC) for 10 cell lines included in both data sets. Approximately half of the COSMIC variants were successfully called. Importantly, missing variants could mainly be attributed to sites with low read depth. Moreover, filtered variants also included all 10 cancer gene census COSMIC variants, a condensed hallmark variant set.
Collapse
Affiliation(s)
- Sonja Eberth
- Human and Animal Cell Lines, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Julia Koblitz
- Bioinformatics, IT and Databases, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Laura Steenpaß
- Human and Animal Cell Lines, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
- Zoological Institute, Technische Universität Braunschweig, 38106, Braunschweig, Germany
| | - Claudia Pommerenke
- Bioinformatics, IT and Databases, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany.
| |
Collapse
|
2
|
Stojkovic M, Ortuño Guzmán FM, Han D, Stojkovic P, Dopazo J, Stankovic KM. Polystyrene nanoplastics affect transcriptomic and epigenomic signatures of human fibroblasts and derived induced pluripotent stem cells: Implications for human health. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 320:120849. [PMID: 36509347 DOI: 10.1016/j.envpol.2022.120849] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 12/01/2022] [Accepted: 12/07/2022] [Indexed: 06/17/2023]
Abstract
Plastic pollution is increasing at an alarming rate yet the impact of this pollution on human health is poorly understood. Because human induced pluripotent stem cells (hiPSC) are frequently derived from dermal fibroblasts, these cells offer a powerful platform for the identification of molecular biomarkers of environmental pollution in human cells. Here, we describe a novel proof-of-concept for deriving hiPSC from human dermal fibroblasts deliberately exposed to polystyrene (PS) nanoplastic particles; unexposed hiPSC served as controls. In parallel, unexposed hiPSC were exposed to low and high concentrations of PS nanoparticles. Transcriptomic and epigenomic signatures of all fibroblasts and hiPSCs were defined using RNA-seq and whole genome methyl-seq, respectively. Both PS-treated fibroblasts and derived hiPSC showed alterations in expression of ESRRB and HNF1A genes and circuits involved in the pluripotency of stem cells, as well as in pathways involved in cancer, inflammatory disorders, gluconeogenesis, carbohydrate metabolism, innate immunity, and dopaminergic synapse. Similarly, the expression levels of identified key transcriptional and DNA methylation changes (DNMT3A, ESSRB, FAM133CP, HNF1A, SEPTIN7P8, and TTC34) were significantly affected in both PS-exposed fibroblasts and hiPSC. This study illustrates the power of human cellular models of environmental pollution to narrow down and prioritize the list of candidate molecular biomarkers of environmental pollution. This knowledge will facilitate the deciphering of the origins of environmental diseases.
Collapse
Affiliation(s)
| | | | - Dongjun Han
- Otolaryngology - Head & Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA
| | | | - Joaquin Dopazo
- Bioinformatics Area, Andalusian Public Foundation Progress and Health-FPS, Sevilla, 41013, Spain; Bioinformatics in Rare Diseases (BiER), Centro de Investigaciones Biomédicas en Reden Enfermedades Raras (CIBERER), Seville, Spain; Computational Systems Medicine Group, Institute of Biomedicine of Seville (IBIS), Hospital Virgen Del Rocío, Seville, Spain
| | - Konstantina M Stankovic
- Otolaryngology - Head & Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| |
Collapse
|
3
|
Garcia BJ, Urrutia J, Zheng G, Becker D, Corbet C, Maschhoff P, Cristofaro A, Gaffney N, Vaughn M, Saxena U, Chen YP, Gordon DB, Eslami M. A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists. SYNTHETIC BIOLOGY (OXFORD, ENGLAND) 2022; 7:ysac012. [PMID: 36035514 PMCID: PMC9408027 DOI: 10.1093/synbio/ysac012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 06/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022]
Abstract
Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data.
Graphical Abstract
Collapse
Affiliation(s)
- Benjamin J Garcia
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Joshua Urrutia
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | | | | | | | | | - Alexander Cristofaro
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Niall Gaffney
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Matthew Vaughn
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Uma Saxena
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - D Benjamin Gordon
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | |
Collapse
|
4
|
O Adetunji M, J Abraham B. SEAseq: a portable and cloud-based chromatin occupancy analysis suite. BMC Bioinformatics 2022; 23:77. [PMID: 35193506 PMCID: PMC8864840 DOI: 10.1186/s12859-022-04588-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/28/2022] [Indexed: 11/26/2022] Open
Abstract
Background Genome-wide protein-DNA binding is popularly assessed using specific antibody pulldown in Chromatin Immunoprecipitation Sequencing (ChIP-Seq) or Cleavage Under Targets and Release Using Nuclease (CUT&RUN) sequencing experiments. These technologies generate high-throughput sequencing data that necessitate the use of multiple sophisticated, computationally intensive genomic tools to make discoveries, but these genomic tools often have a high barrier to use because of computational resource constraints. Results We present a comprehensive, infrastructure-independent, computational pipeline called SEAseq, which leverages field-standard, open-source tools for processing and analyzing ChIP-Seq/CUT&RUN data. SEAseq performs extensive analyses from the raw output of the experiment, including alignment, peak calling, motif analysis, promoters and metagene coverage profiling, peak annotation distribution, clustered/stitched peaks (e.g. super-enhancer) identification, and multiple relevant quality assessment metrics, as well as automatic interfacing with data in GEO/SRA. SEAseq enables rapid and cost-effective resource for analysis of both new and publicly available datasets as demonstrated in our comparative case studies. Conclusions The easy-to-use and versatile design of SEAseq makes it a reliable and efficient resource for ensuring high quality analysis. Its cloud implementation enables a broad suite of analyses in environments with constrained computational resources. SEAseq is platform-independent and is aimed to be usable by everyone with or without programming skills. It is available on the cloud at https://platform.stjude.cloud/workflows/seaseq and can be locally installed from the repository at https://github.com/stjude/seaseq. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04588-z.
Collapse
Affiliation(s)
- Modupeore O Adetunji
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Brian J Abraham
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
5
|
Miloradovic D, Pavlovic D, Jankovic MG, Nikolic S, Papic M, Milivojevic N, Stojkovic M, Ljujic B. Human Embryos, Induced Pluripotent Stem Cells, and Organoids: Models to Assess the Effects of Environmental Plastic Pollution. Front Cell Dev Biol 2021; 9:709183. [PMID: 34540831 PMCID: PMC8446652 DOI: 10.3389/fcell.2021.709183] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 07/19/2021] [Indexed: 02/03/2023] Open
Abstract
For a long time, animal models were used to mimic human biology and diseases. However, animal models are not an ideal solution due to numerous interspecies differences between humans and animals. New technologies, such as human-induced pluripotent stem cells and three-dimensional (3D) cultures such as organoids, represent promising solutions for replacing, refining, and reducing animal models. The capacity of organoids to differentiate, self-organize, and form specific, complex, biologically suitable structures makes them excellent in vitro models of development and disease pathogenesis, as well as drug-screening platforms. Despite significant potential health advantages, further studies and considerable nuances are necessary before their clinical use. This article summarizes the definition of embryoids, gastruloids, and organoids and clarifies their appliance as models for early development, diseases, environmental pollution, drug screening, and bioinformatics.
Collapse
Affiliation(s)
- Dragana Miloradovic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| | - Dragica Pavlovic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| | - Marina Gazdic Jankovic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| | - Sandra Nikolic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| | - Milos Papic
- Department of Dentistry, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| | - Nevena Milivojevic
- Laboratory for Bioengineering, Department of Science, Institute for Information Technologies, University of Kragujevac, Kragujevac, Serbia
| | - Miodrag Stojkovic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
- SPEBO Medical Fertility Hospital, Leskovac, Serbia
| | - Biljana Ljujic
- Department of Genetics, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
| |
Collapse
|
6
|
Rian K, Hidalgo MR, Çubuk C, Falco MM, Loucera C, Esteban-Medina M, Alamo-Alvarez I, Peña-Chilet M, Dopazo J. Genome-scale mechanistic modeling of signaling pathways made easy: A bioconductor/cytoscape/web server framework for the analysis of omic data. Comput Struct Biotechnol J 2021; 19:2968-2978. [PMID: 34136096 PMCID: PMC8170118 DOI: 10.1016/j.csbj.2021.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-scale mechanistic models of pathways are gaining importance for genomic data interpretation because they provide a natural link between genotype measurements (transcriptomics or genomics data) and the phenotype of the cell (its functional behavior). Moreover, mechanistic models can be used to predict the potential effect of interventions, including drug inhibitions. Here, we present the implementation of a mechanistic model of cell signaling for the interpretation of transcriptomic data as an R/Bioconductor package, a Cytoscape plugin and a web tool with enhanced functionality which includes building interpretable predictors, estimation of the effect of perturbations and assessment of the effect of mutations in complex scenarios.
Collapse
Affiliation(s)
- Kinza Rian
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Laboratory of Innovative Technologies (LTI), National School of Applied Sciences in Tangier, UAE, Morocco
| | - Marta R. Hidalgo
- Bioinformatics and Biostatistics Unit, Centro de Investigación Príncipe Felipe (CIPF), 46012 Valencia, Spain
| | - Cankut Çubuk
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Matias M. Falco
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Marina Esteban-Medina
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Inmaculada Alamo-Alvarez
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
- Functional Genomics Node (INB-ELIXIR-es), Sevilla, Spain
| |
Collapse
|