1
|
Cheng JH, Zheng C, Yamada R, Okada D. Visualization of the landscape of the read alignment shape of ATAC-seq data using Hellinger distance metric. Genes Cells 2024; 29:5-16. [PMID: 37989133 DOI: 10.1111/gtc.13082] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/25/2023] [Accepted: 10/28/2023] [Indexed: 11/23/2023]
Abstract
Assay for Transposase-Accessible Chromatin using high-throughput sequencing (ATAC-seq) is the popular technique using next-generation sequencing to measure chromatin accessibility and identify open chromatin regions. While read alignment shape information of next-generation sequencing data with intensity information has been used in various bioinformatics methods, few studies have focused on pure shape information alone. In this study, we investigated what types of ATAC-seq read alignment shapes are observed for the promoter region and whether the pure shape information was related or unrelated to other gene features. We introduced a novel concept and pipeline for handling the pure shape information of NGS data as probability distributions and quantifying their dissimilarities by information theory. Based on this concept, we demonstrate that the pure shape information of ATAC-seq data is correlated with chromatin openness and some gene characteristics. On the other hand, it is suggested that the pure information of ATAC-seq read alignment shape is unlikely to contain additional information to explain differences in RNA expression. Our study suggests that viewing the read alignment shape of NGS data as probability distributions enables us to capture the characteristics of the genome-wide landscape of such data in a non-parametric manner.
Collapse
Affiliation(s)
- Jian Hao Cheng
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Cheng Zheng
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ryo Yamada
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Daigo Okada
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
2
|
Mani I, Singh V. Applications of bioinformatics in epigenetics. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2023; 198:1-13. [PMID: 37225316 DOI: 10.1016/bs.pmbts.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Epigenetic modifications such as DNA methylation, post-translational chromatin modifications and non-coding RNA-mediated mechanisms are responsible for epigenetic inheritance. Change in gene expression due to these epigenetic modifications are responsible for new traits in different organisms leading to various diseases including cancer, diabetic kidney disease (DKD), diabetic nephropathy (DN) and renal fibrosis. Bioinformatics is an effective approach for epigenomic profiling. These epigenomic data can be analyzed by a large number of bioinformatics tools and software. Many databases are available online, which comprises huge amount of information regarding these modifications. Recent methodologies include many sequencing and analytical techniques to extrapolate different types of epigenetic data. This data can be used to design drugs against diseases linked to epigenetic modifications. This chapter briefly highlights different epigenetics databases (MethDB, REBASE, Pubmeth, MethPrimerDB, Histone Database, ChromDB, MeInfoText database, EpimiR, Methylome DB, and dbHiMo), and tools (compEpiTools, CpGProD, MethBlAST, EpiExplorer, and BiQ analyzer), which are being utilized to retrieve the data and mechanistically analysis of epigenetics modifications.
Collapse
Affiliation(s)
- Indra Mani
- Department of Microbiology, Gargi College, University of Delhi, New Delhi, India.
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana, Gujarat, India
| |
Collapse
|
3
|
Bürger A, Dugas M. Cogito: automated and generic comparison of annotated genomic intervals. BMC Bioinformatics 2022; 23:315. [PMID: 35927614 PMCID: PMC9351259 DOI: 10.1186/s12859-022-04853-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/23/2022] [Indexed: 11/27/2022] Open
Abstract
Background Genetic and epigenetic biological studies often combine different types of experiments and multiple conditions. While the corresponding raw and processed data are made available through specialized public databases, the processed files are usually limited to a specific research question. Hence, they are unsuitable for an unbiased, systematic overview of a complex dataset. However, possible combinations of different sample types and conditions grow exponentially with the amount of sample types and conditions. Therefore the risk to miss a correlation or to overrate an identified correlation should be mitigated in a complex dataset. Since reanalysis of a full study is rarely a viable option, new methods are needed to address these issues systematically, reliably, reproducibly and efficiently. Results Cogito “COmpare annotated Genomic Intervals TOol” provides a workflow for an unbiased, structured overview and systematic analysis of complex genomic datasets consisting of different data types (e.g. RNA-seq, ChIP-seq) and conditions. Cogito is able to visualize valuable key information of genomic or epigenomic interval-based data, thereby providing a straightforward analysis approach for comparing different conditions. It supports getting an unbiased impression of a dataset and developing an appropriate analysis strategy for it. In addition to a text-based report, Cogito offers a fully customizable report as a starting point for further in-depth investigation. Conclusions Cogito implements a novel approach to facilitate high-level overview analyses of complex datasets, and offers additional insights into the data without the need for a full, time-consuming reanalysis. The R/Bioconductor package is freely available at https://bioconductor.org/packages/release/bioc/html/Cogito.html, a comprehensive documentation with detailed descriptions and reproducible examples is included. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04853-1.
Collapse
Affiliation(s)
- Annika Bürger
- Institute of Medical Informatics, Westfälische Wilhelms-Universität Münster, Albert-Schweitzer-Campus 1, 48149, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Seminarstr. 2, 69117, Heidelberg, Germany
| |
Collapse
|
4
|
Salek Farrokhi A, Mohammadlou M, Abdollahi M, Eslami M, Yousefi B. Histone Deacetylase Modifications by Probiotics in Colorectal Cancer. J Gastrointest Cancer 2021; 51:754-764. [PMID: 31808058 DOI: 10.1007/s12029-019-00338-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has been demonstrated that epigenetic modifications of histone (acetylation/deacetylation) participate in a critical role in cancer progression by the regulation of gene expression. Several processes could be regulated by deacetylation of histone and non-histone proteins such as apoptosis, proliferation, cell metabolism, differentiation, and DNA repair. Hence, histone deacetylase inhibitors (HDACis) are employed as a hopeful group of anti-cancer drugs that could inhibit tumor cell proliferation or apoptosis. The elimination of the acetylation marks that take place as an essential epigenetic change in cancer cells is associated to HDAC expression and activity. In this regard, it has been reported that class I HDACs have a vital role in the regulation of tumor cell proliferation. OBJECTIVES: In this review, we discuss whether gut origin microorganisms could promote cancer or tumor resistance and explain mechanisms of these processes. CONCLUSIONS: According to the enormous capacity of the metabolism of the intestine microbiota, bacteria are likely to convert nutrients and digestive compounds into metabolites that regulate epigenetic in cancer. The effect of the food is of interest on epigenetic changes in the intestinal mucosa and colonocytes, as misleading nucleotide methylation may be a prognostic marker for colorectal cancer (CRC). Since epigenetic changes are potentially reversible, they can serve as therapeutic targets for preventing CRC. However, various mechanisms have been identified in the field of prevention, treatment, and progression of cancer by probiotics, which include intestinal microbiota modulation, increased intestinal barrier function, degradation of potential carcinogens, protective effect on intestinal epithelial damage, and increased immune function.
Collapse
Affiliation(s)
- Amir Salek Farrokhi
- Department of Immunology, Semnan University of Medical Sciences, Semnan, Iran
| | - Maryam Mohammadlou
- Department of Immunology, Semnan University of Medical Sciences, Semnan, Iran
| | - Maryam Abdollahi
- Department of Immunology, Semnan University of Medical Sciences, Semnan, Iran
| | - Majid Eslami
- Cancer Research Center, Semnan University of Medical Sciences, Semnan, Iran.
| | - Bahman Yousefi
- Department of Immunology, Semnan University of Medical Sciences, Semnan, Iran
| |
Collapse
|
5
|
Kim M, Lin S. Characterization of histone modification patterns and prediction of novel promoters using functional principal component analysis. PLoS One 2020; 15:e0233630. [PMID: 32459819 PMCID: PMC7252632 DOI: 10.1371/journal.pone.0233630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 05/08/2020] [Indexed: 12/04/2022] Open
Abstract
Characterization of distinct histone methylation and acetylation binding patterns in promoters and prediction of novel regulatory regions remains an important area of genomic research, as it is hypothesized that distinct chromatin signatures may specify unique genomic functions. However, methods that have been proposed in the literature are either descriptive in nature or are fully parametric and hence more restrictive in pattern discovery. In this article, we propose a two-step non-parametric statistical inference procedure to characterize unique histone modification patterns and apply it to analyzing the binding patterns of four histone marks, H3K4me2, H3K4me3, H3K9ac, and H4K20me1, in human B-lymphoblastoid cells. In the first step, we used a functional principal component analysis method to represent the concatenated binding patterns of these four histone marks around the transcription start sites as smooth curves. In the second step, we clustered these curves to reveal several unique classes of binding patterns. These uncovered patterns were used in turn to scan the whole-genome to predict novel and alternative promoters. Our analyses show that there are three distinct promoter binding patterns of active genes. Further, 19654 regions not within known gene promoters were found to overlap with human ESTs, CpG islands, or common SNPs, indicative of their potential role in gene regulation, including being potential novel promoter regions.
Collapse
Affiliation(s)
- Mijeong Kim
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
- * E-mail:
| | - Shili Lin
- Department of Statistics, Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
6
|
Meo M, Meste O, Signore S, Rota M. Novel Methods for High-resolution Assessment of Cardiac Action Potential Repolarization. Biomed Signal Process Control 2020; 51:30-41. [PMID: 31938034 DOI: 10.1016/j.bspc.2019.02.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The profile of the action potential (AP) of cardiomyocytes contributes to the modality of ventricular repolarization of the heart. Experimentally, the examination of the AP in isolated cardiomyocytes provides information on their electrical properties, adaptations to physiological and pathological conditions, and putative ionic mechanisms involved in the process. Currently, there are no available platforms for automated assessment of AP properties and standard methodologies restrict the examination of the AP repolarization to discrete, user-defined ranges, neglecting significant intervals of the electrical recovery. This study proposes two automatic methods to assess AP profile throughout the entire repolarization phase. One method is based on AP data inversion and direct extraction of patterns describing beat-to-beat dynamics. The second method is based on evolutive singular value decomposition (ESVD), which identifies common patterns in a series of consecutive APs. The two methodologies were employed to analyze electrical signals collected from cardiomyocites obtained from healthy mice and animals with diabetes, a condition associated with alterations of AP properties in cardiac cells. Our methodologies revealed that the duration of the early repolarization phase of the AP tended to become progressively longer during a stimulation train, whereas the late repolarization progressively shortened. Although this behavior was comparable in the two groups of cells, alterations in AP dynamics occurred at distinct repolarization levels, a feature highlighted by the ESVD approach. In conclusion, the proposed methodologies allow detailed, automatic analysis of the AP repolarization and identification of critical alterations occurring in the electrical behavior of myocytes under pathological conditions.
Collapse
Affiliation(s)
- Marianna Meo
- IHU Liryc, Electrophysiology and Heart Modeling Institute, Bordeaux University Foundation, F-33600 Pessac-Bordeaux, France, with Univ. Bordeaux, CRCTB, U1045, Bordeaux, France, and with INSERM, CRCTB, U1045, Bordeaux, France
| | | | - Sergio Signore
- Departments of Anesthesia and Medicine, and Division of Cardiovascular Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Marcello Rota
- Department of Physiology, New York Medical College, Valhalla, NY 10595, USA
| |
Collapse
|
7
|
Cremona MA, Xu H, Makova KD, Reimherr M, Chiaromonte F, Madrigal P. Functional data analysis for computational biology. Bioinformatics 2019; 35:3211-3213. [PMID: 30668667 PMCID: PMC6736445 DOI: 10.1093/bioinformatics/btz045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 01/01/2019] [Accepted: 01/17/2019] [Indexed: 12/25/2022] Open
Abstract
SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Hongyan Xu
- Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Matthew Reimherr
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
- Institute of Economics, Sant’Anna School of Advanced Studies, EMbeDS Economics and Management in the era of Data Science, Pisa, Italy
| | - Pedro Madrigal
- Wellcome Trust – MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| |
Collapse
|
8
|
Ferguson J, Atit RP. A tale of two cities: The genetic mechanisms governing calvarial bone development. Genesis 2019; 57:e23248. [PMID: 30155972 PMCID: PMC7433025 DOI: 10.1002/dvg.23248] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 08/21/2018] [Accepted: 08/23/2018] [Indexed: 12/25/2022]
Abstract
The skull bones must grow in a coordinated, three-dimensional manner to coalesce and form the head and face. Mammalian skull bones have a dual embryonic origin from cranial neural crest cells (CNCC) and paraxial mesoderm (PM) and ossify through intramembranous ossification. The calvarial bones, the bones of the cranium which cover the brain, are derived from the supraorbital arch (SOA) region mesenchyme. The SOA is the site of frontal and parietal bone morphogenesis and primary center of ossification. The objective of this review is to frame our current in vivo understanding of the morphogenesis of the calvarial bones and the gene networks regulating calvarial bone initiation in the SOA mesenchyme.
Collapse
Affiliation(s)
- James Ferguson
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106
- Department of Genetics, Case Western Reserve University, Cleveland OH 44106
- Department of Dermatology, Case Western Reserve University, Cleveland OH 44106
| | - Radhika P. Atit
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106
- Department of Genetics, Case Western Reserve University, Cleveland OH 44106
- Department of Dermatology, Case Western Reserve University, Cleveland OH 44106
| |
Collapse
|
9
|
Stavrovskaya ED, Niranjan T, Fertig EJ, Wheelan SJ, Favorov AV, Mironov AA. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data. Bioinformatics 2018; 33:3158-3165. [PMID: 29028265 DOI: 10.1093/bioinformatics/btx379] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 06/12/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. Contact favorov@sensi.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elena D Stavrovskaya
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| | - Tejasvi Niranjan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sarah J Wheelan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, RAS, Moscow 119333, Russia.,Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow 117545, Russia
| | - Andrey A Mironov
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| |
Collapse
|
10
|
Dozmorov MG. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning. Bioinformatics 2018; 33:3323-3330. [PMID: 29028263 DOI: 10.1093/bioinformatics/btx414] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/22/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. Results The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. Contact mikhail.dozmorov@vcuhealth.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
11
|
Madrigal P. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets. Bioinformatics 2017; 33:746-748. [PMID: 27993776 PMCID: PMC5408813 DOI: 10.1093/bioinformatics/btw724] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 11/16/2016] [Indexed: 01/08/2023] Open
Abstract
Summary Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. Availability and Implementation An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pedro Madrigal
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hixton, UK
| |
Collapse
|
12
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1506] [Impact Index Per Article: 167.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|