201
|
Ayturk UM, Jacobsen CM, Christodoulou DC, Gorham J, Seidman JG, Seidman CE, Robling AG, Warman ML. An RNA-seq protocol to identify mRNA expression changes in mouse diaphyseal bone: applications in mice with bone property altering Lrp5 mutations. J Bone Miner Res 2013; 28:2081-93. [PMID: 23553928 PMCID: PMC3743099 DOI: 10.1002/jbmr.1946] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 03/18/2013] [Accepted: 03/22/2013] [Indexed: 01/20/2023]
Abstract
Loss-of-function and certain missense mutations in the Wnt coreceptor low-density lipoprotein receptor-related protein 5 (LRP5) significantly decrease or increase bone mass, respectively. These human skeletal phenotypes have been recapitulated in mice harboring Lrp5 knockout and knock-in mutations. We hypothesized that measuring mRNA expression in diaphyseal bone from mice with Lrp5 wild-type (Lrp5(+/+) ), knockout (Lrp5(-/-) ), and high bone mass (HBM)-causing (Lrp5(p.A214V/+) ) knock-in alleles could identify genes and pathways that regulate or are regulated by LRP5 activity. We performed RNA-seq on pairs of tibial diaphyseal bones from four 16-week-old mice with each of the aforementioned genotypes. We then evaluated different methods for controlling for contaminating nonskeletal tissue (ie, blood, bone marrow, and skeletal muscle) in our data. These methods included predigestion of diaphyseal bone with collagenase and separate transcriptional profiling of blood, skeletal muscle, and bone marrow. We found that collagenase digestion reduced contamination, but also altered gene expression in the remaining cells. In contrast, in silico filtering of the diaphyseal bone RNA-seq data for highly expressed blood, skeletal muscle, and bone marrow transcripts significantly increased the correlation between RNA-seq data from an animal's right and left tibias and from animals with the same Lrp5 genotype. We conclude that reliable and reproducible RNA-seq data can be obtained from mouse diaphyseal bone and that lack of LRP5 has a more pronounced effect on gene expression than the HBM-causing LRP5 missense mutation. We identified 84 differentially expressed protein-coding transcripts between LRP5 "sufficient" (ie, Lrp5(+/+) and Lrp5(p.A214V/+) ) and "insufficient" (Lrp5(-/-) ) diaphyseal bone, and far fewer differentially expressed genes between Lrp5(p.A214V/+) and Lrp5(+/+) diaphyseal bone.
Collapse
Affiliation(s)
- Ugur M. Ayturk
- Department of Orthopaedic Surgery, Boston Children’s Hospital, Boston, MA
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Christina M. Jacobsen
- Department of Genetics, Harvard Medical School, Boston, MA
- Department of Endocrinology, Boston Children’s Hospital, Boston, MA
| | | | - Joshua Gorham
- Department of Genetics, Harvard Medical School, Boston, MA
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Boston, MA
| | - Alexander G. Robling
- Department of Anatomy and Cell Biology, Indiana University School of Medicine, Indianapolis, IN
| | - Matthew L. Warman
- Department of Orthopaedic Surgery, Boston Children’s Hospital, Boston, MA
- Department of Genetics, Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Boston, MA
| |
Collapse
|
202
|
Bonfert T, Csaba G, Zimmer R, Friedel CC. Mining RNA-seq data for infections and contaminations. PLoS One 2013; 8:e73071. [PMID: 24019895 PMCID: PMC3760913 DOI: 10.1371/journal.pone.0073071] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 07/16/2013] [Indexed: 02/06/2023] Open
Abstract
RNA sequencing (RNA-seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Gergely Csaba
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Ralf Zimmer
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Caroline C. Friedel
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
203
|
Ilott NE, Ponting CP. Predicting long non-coding RNAs using RNA sequencing. Methods 2013; 63:50-9. [PMID: 23541739 DOI: 10.1016/j.ymeth.2013.03.019] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 03/12/2013] [Accepted: 03/19/2013] [Indexed: 02/01/2023] Open
Abstract
The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and - the focus of this review - deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-seq data. Here we review current methods for identifying lncRNAs using large-scale sequencing data from RNA-seq experiments and highlight analytical considerations that are required when undertaking such projects.
Collapse
Affiliation(s)
- Nicholas E Ilott
- CGAT, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, UK.
| | | |
Collapse
|
204
|
Spicuglia S, Maqbool MA, Puthier D, Andrau JC. An update on recent methods applied for deciphering the diversity of the noncoding RNA genome structure and function. Methods 2013; 63:3-17. [DOI: 10.1016/j.ymeth.2013.04.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 04/02/2013] [Accepted: 04/04/2013] [Indexed: 12/17/2022] Open
|
205
|
Imashimizu M, Oshima T, Lubkowska L, Kashlev M. Direct assessment of transcription fidelity by high-resolution RNA sequencing. Nucleic Acids Res 2013; 41:9090-104. [PMID: 23925128 PMCID: PMC3799451 DOI: 10.1093/nar/gkt698] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Cancerous and aging cells have long been thought to be impacted by transcription errors that cause genetic and epigenetic changes. Until now, a lack of methodology for directly assessing such errors hindered evaluation of their impact to the cells. We report a high-resolution Illumina RNA-seq method that can assess noncoded base substitutions in mRNA at 10−4–10−5 per base frequencies in vitro and in vivo. Statistically reliable detection of changes in transcription fidelity through ∼103 nt DNA sites assures that the RNA-seq can analyze the fidelity in a large number of the sites where errors occur. A combination of the RNA-seq and biochemical analyses of the positions for the errors revealed two sequence-specific mechanisms that increase transcription fidelity by Escherichia coli RNA polymerase: (i) enhanced suppression of nucleotide misincorporation that improves selectivity for the cognate substrate, and (ii) increased backtracking of the RNA polymerase that decreases a chance of error propagation to the full-length transcript after misincorporation and provides an opportunity to proofread the error. This method is adoptable to a genome-wide assessment of transcription fidelity.
Collapse
Affiliation(s)
- Masahiko Imashimizu
- Gene Regulation and Chromosome Biology Laboratory, Frederick National Laboratory for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA and Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan
| | | | | | | |
Collapse
|
206
|
Farkas MH, Grant GR, White JA, Sousa ME, Consugar MB, Pierce EA. Transcriptome analyses of the human retina identify unprecedented transcript diversity and 3.5 Mb of novel transcribed sequence via significant alternative splicing and novel genes. BMC Genomics 2013; 14:486. [PMID: 23865674 PMCID: PMC3924432 DOI: 10.1186/1471-2164-14-486] [Citation(s) in RCA: 136] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 07/15/2013] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The retina is a complex tissue comprised of multiple cell types that is affected by a diverse set of diseases that are important causes of vision loss. Characterizing the transcripts, both annotated and novel, that are expressed in a given tissue has become vital for understanding the mechanisms underlying the pathology of disease. RESULTS We sequenced RNA prepared from three normal human retinas and characterized the retinal transcriptome at an unprecedented level due to the increased depth of sampling provided by the RNA-seq approach. We used a non-redundant reference transcriptome from all of the empirically-determined human reference tracks to identify annotated and novel sequences expressed in the retina. We detected 79,915 novel alternative splicing events, including 29,887 novel exons, 21,757 3' and 5' alternate splice sites, and 28,271 exon skipping events. We also identified 116 potential novel genes. These data represent a significant addition to the annotated human transcriptome. For example, the novel exons detected increase the number of identified exons by 3%. Using a high-throughput RNA capture approach to validate 14,696 of these novel transcriptome features we found that 99% of the putative novel events can be reproducibly detected. Further, 15-36% of the novel splicing events maintain an open reading frame, suggesting they produce novel protein products. CONCLUSIONS To our knowledge, this is the first application of RNA capture to perform large-scale validation of novel transcriptome features. In total, these analyses provide extensive detail about a previously uncharacterized level of transcript diversity in the human retina.
Collapse
Affiliation(s)
- Michael H Farkas
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
- Berman-Gund Laboratory for the Study of Retinal Degenerations, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
| | - Gregory R Grant
- Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Joseph A White
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
- Berman-Gund Laboratory for the Study of Retinal Degenerations, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
| | - Maria E Sousa
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
- Berman-Gund Laboratory for the Study of Retinal Degenerations, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
| | - Mark B Consugar
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
- Berman-Gund Laboratory for the Study of Retinal Degenerations, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
| | - Eric A Pierce
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
- Berman-Gund Laboratory for the Study of Retinal Degenerations, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
207
|
Jodar M, Selvaraju S, Sendler E, Diamond MP, Krawetz SA. The presence, role and clinical use of spermatozoal RNAs. Hum Reprod Update 2013; 19:604-24. [PMID: 23856356 DOI: 10.1093/humupd/dmt031] [Citation(s) in RCA: 250] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Spermatozoa are highly differentiated, transcriptionally inert cells characterized by a compact nucleus with minimal cytoplasm. Nevertheless they contain a suite of unique RNAs that are delivered to oocyte upon fertilization. They are likely integrated as part of many different processes including genome recognition, consolidation-confrontation, early embryonic development and epigenetic transgenerational inherence. Spermatozoal RNAs also provide a window into the developmental history of each sperm thereby providing biomarkers of fertility and pregnancy outcome which are being intensely studied. METHODS Literature searches were performed to review the majority of spermatozoal RNA studies that described potential functions and clinical applications with emphasis on Next-Generation Sequencing. Human, mouse, bovine and stallion were compared as their distribution and composition of spermatozoal RNAs, using these techniques, have been described. RESULTS Comparisons highlighted the complexity of the population of spermatozoal RNAs that comprises rRNA, mRNA and both large and small non-coding RNAs. RNA-seq analysis has revealed that only a fraction of the larger RNAs retain their structure. While rRNAs are the most abundant and are highly fragmented, ensuring a translationally quiescent state, other RNAs including some mRNAs retain their functional potential, thereby increasing the opportunity for regulatory interactions. Abundant small non-coding RNAs retained in spermatozoa include miRNAs and piRNAs. Some, like miR-34c are essential to the early embryo development required for the first cellular division. Others like the piRNAs are likely part of the genomic dance of confrontation and consolidation. Other non-coding spermatozoal RNAs include transposable elements, annotated lnc-RNAs, intronic retained elements, exonic elements, chromatin-associated RNAs, small-nuclear ILF3/NF30 associated RNAs, quiescent RNAs, mse-tRNAs and YRNAs. Some non-coding RNAs are known to act as epigenetic modifiers, inducing histone modifications and DNA methylation, perhaps playing a role in transgenerational epigenetic inherence. Transcript profiling holds considerable potential for the discovery of fertility biomarkers for both agriculture and human medicine. Comparing the differential RNA profiles of infertile and fertile individuals as well as assessing species similarities, should resolve the regulatory pathways contributing to male factor infertility. CONCLUSIONS Dad delivers a complex population of RNAs to the oocyte at fertilization that likely influences fertilization, embryo development, the phenotype of the offspring and possibly future generations. Development is continuing on the use of spermatozoal RNA profiles as phenotypic markers of male factor status for use as clinical diagnostics of the father's contribution to the birth of a healthy child.
Collapse
Affiliation(s)
- Meritxell Jodar
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | | | | | | | | | | |
Collapse
|
208
|
Abstract
Systemic response to DNA damage and other stresses is a complex process that includes changes in the regulation and activity of nearly all stages of gene expression. One gene regulatory mechanism used by eukaryotes is selection among alternative transcript isoforms that differ in polyadenylation [poly(A)] sites, resulting in changes either to the coding sequence or to portions of the 3' UTR that govern translation, stability, and localization. To determine the extent to which this means of regulation is used in response to DNA damage, we conducted a global analysis of poly(A) site usage in Saccharomyces cerevisiae after exposure to the UV mimetic, 4-nitroquinoline 1-oxide (4NQO). Two thousand thirty-one genes were found to have significant variation in poly(A) site distributions following 4NQO treatment, with a strong bias toward loss of short transcripts, including many with poly(A) sites located within the protein coding sequence (CDS). We further explored one possible mechanism that could contribute to the widespread differences in mRNA isoforms. The change in poly(A) site profile was associated with an inhibition of cleavage and polyadenylation in cell extract and a decrease in the levels of several key subunits in the mRNA 3'-end processing complex. Sequence analysis identified differences in the cis-acting elements that flank putatively suppressed and enhanced poly(A) sites, suggesting a mechanism that could discriminate between variable and constitutive poly(A) sites. Our analysis indicates that variation in mRNA length is an important part of the regulatory response to DNA damage.
Collapse
|
209
|
Spaethling JM, Eberwine JH. Single-cell transcriptomics for drug target discovery. Curr Opin Pharmacol 2013; 13:786-90. [PMID: 23725882 DOI: 10.1016/j.coph.2013.04.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Revised: 04/25/2013] [Accepted: 04/27/2013] [Indexed: 10/26/2022]
Abstract
Single cell sequencing is currently in its relative infancy although an unprecedented amount of information is already being generated. These techniques are providing new insight into intercellular variability as well as identification of previously unrecognized drug targets. As more groups are gaining an interest in this fruitful technique, new sample preparation techniques, sequencing platforms, and bioinformatics tools are being developed which only improve the quantity and quality of data generated in these studies. Great advancements in harvest (in vivo pipette), sample preparation, and sequencing (Illumina HiSeq 2500/MiSeq, Ion Torrent PGM, Pacific Biosciences RS) are allowing for previously untestable questions to be answered and for expanded accessibility of these technologies.
Collapse
Affiliation(s)
- Jennifer M Spaethling
- Department of Pharmacology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
210
|
Abstract
RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46.3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired.
Collapse
Affiliation(s)
- David Hiller
- Center for Epigenetics, Johns Hopkins School of Medicine, 855 N. Wolfe St., Rangos 570, Baltimore, MD 21205
| | - Wing Hung Wong
- Department of Statistics, Sequoia Hall, 390 Serra Mall, Stanford, CA, 94305
| |
Collapse
|
211
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013. [PMID: 23618408 DOI: 10.1101/000851] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|
212
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013. [PMID: 23618408 DOI: 10.1186/gb‐2013‐14‐4‐r36] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|
213
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013. [PMID: 23618408 DOI: 10.1186/gb-2013-14-4-r361186/gb-2013-14-4-r36] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|
214
|
Pandey RV, Franssen SU, Futschik A, Schlötterer C. Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data. Mol Ecol Resour 2013; 13:740-5. [PMID: 23615333 PMCID: PMC3739924 DOI: 10.1111/1755-0998.12110] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 03/18/2013] [Accepted: 03/22/2013] [Indexed: 11/29/2022]
Abstract
Estimating differences in gene expression among alleles is of high interest for many areas in biology and medicine. Here, we present a user-friendly software tool, Allim, to estimate allele-specific gene expression. Because mapping bias is a major problem for reliable estimates of allele-specific gene expression using RNA-seq, Allim combines two different strategies to account for the mapping biases. In order to reduce the mapping bias, Allim first generates a polymorphism-aware reference genome that accounts for the sequence variation between the alleles. Then, a sequence-specific simulation tool estimates the residual mapping bias. Statistical tests for allelic imbalance are provided that can be used with the bias corrected RNA-seq data.
Collapse
Affiliation(s)
- Ram Vinay Pandey
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | | | | | | |
Collapse
|
215
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14:R36. [PMID: 23618408 PMCID: PMC4053844 DOI: 10.1186/gb-2013-14-4-r36] [Citation(s) in RCA: 9602] [Impact Index Per Article: 800.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 04/25/2013] [Indexed: 11/10/2022] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|
216
|
Carrara M, Beccuti M, Cavallo F, Donatelli S, Lazzarato F, Cordero F, Calogero RA. State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? BMC Bioinformatics 2013; 14 Suppl 7:S2. [PMID: 23815381 PMCID: PMC3633050 DOI: 10.1186/1471-2105-14-s7-s2] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background RNA-seq has the potential to discover genes created by chromosomal rearrangements. Fusion genes, also known as "chimeras", are formed by the breakage and re-joining of two different chromosomes. It is known that chimeras have been implicated in the development of cancer. Few publications in the past showed the presence of fusion events also in normal tissue, but with very limited overlaps between their results. More recently, two fusion genes in normal tissues were detected using both RNA-seq and protein data. Due to heterogeneous results in identifying chimeras in normal tissue, we decided to evaluate the efficacy of state of the art fusion finders in detecting chimeras in RNA-seq data from normal tissues. Results We compared the performance of six fusion-finder tools: FusionHunter, FusionMap, FusionFinder, MapSplice, deFuse and TopHat-fusion. To evaluate the sensitivity we used a synthetic dataset of fusion-products, called positive dataset; in these experiments FusionMap, FusionFinder, MapSplice, and TopHat-fusion are able to detect more than 78% of fusion genes. All tools were error prone with high variability among the tools, identifying some fusion genes not present in the synthetic dataset. To better investigate the false discovery chimera detection rate, synthetic datasets free of fusion-products, called negative datasets, were used. The negative datasets have different read lengths and quality scores, which allow detecting dependency of the tools on both these features. FusionMap, FusionFinder, mapSplice, deFuse and TopHat-fusion were error-prone. Only FusionHunter results were free of false positive. FusionMap gave the best compromise in terms of specificity in the negative dataset and of sensitivity in the positive dataset. Conclusions We have observed a dependency of the tools on read length, quality score and on the number of reads supporting each chimera. Thus, it is important to carefully select the software on the basis of the structure of the RNA-seq data under analysis. Furthermore, the sensitivity of chimera detection tools does not seem to be sufficient to provide results consistent with those obtained in normal tissues on the basis of fusion events extracted from published data.
Collapse
Affiliation(s)
- Matteo Carrara
- University of Torino, Bioinformatics & Genomics unit, Molecular Biotechnology Center, Via Nizza 52, 10126 Torino, Italy
| | | | | | | | | | | | | |
Collapse
|
217
|
Wu J, Anczuków O, Krainer AR, Zhang MQ, Zhang C. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res 2013; 41:5149-63. [PMID: 23571760 PMCID: PMC3664805 DOI: 10.1093/nar/gkt216] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (∼14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows–Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.
Collapse
Affiliation(s)
- Jie Wu
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | | | | | | |
Collapse
|
218
|
Tang S, Riva A. PASTA: splice junction identification from RNA-sequencing data. BMC Bioinformatics 2013; 14:116. [PMID: 23557086 PMCID: PMC3623791 DOI: 10.1186/1471-2105-14-116] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Accepted: 03/19/2013] [Indexed: 02/05/2023] Open
Abstract
Background Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy. Results Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions. Conclusions PASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing.
Collapse
Affiliation(s)
- Shaojun Tang
- Department of Molecular Genetics and Microbiology, College of Medicine, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
219
|
Focal segmental glomerulosclerosis is induced by microRNA-193a and its downregulation of WT1. Nat Med 2013; 19:481-7. [PMID: 23502960 DOI: 10.1038/nm.3142] [Citation(s) in RCA: 173] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 02/16/2013] [Indexed: 02/08/2023]
Abstract
Focal segmental glomerulosclerosis (FSGS) is a frequent and severe glomerular disease characterized by destabilization of podocyte foot processes. We report that transgenic expression of the microRNA miR-193a in mice rapidly induces FSGS with extensive podocyte foot process effacement. Mechanistically, miR-193a inhibits the expression of the Wilms' tumor protein (WT1), a transcription factor and master regulator of podocyte differentiation and homeostasis. Decreased expression levels of WT1 lead to downregulation of its target genes PODXL (podocalyxin) and NPHS1 (nephrin), as well as several other genes crucial for the architecture of podocytes, initiating a catastrophic collapse of the entire podocyte-stabilizing system. We found upregulation of miR-193a in isolated glomeruli from individuals with FSGS compared to normal kidneys or individuals with other glomerular diseases. Thus, upregulation of miR-193a provides a new pathogenic mechanism for FSGS and is a potential therapeutic target.
Collapse
|
220
|
Bao E, Jiang T, Girke T. BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences. ACTA ACUST UNITED AC 2013; 29:1250-9. [PMID: 23493323 DOI: 10.1093/bioinformatics/btt127] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
MOTIVATION De novo transcriptome assemblies of RNA-Seq data are important for genomics applications of unsequenced organisms. Owing to the complexity and often incomplete representation of transcripts in sequencing libraries, the assembly of high-quality transcriptomes can be challenging. However, with the rapidly growing number of sequenced genomes, it is now feasible to improve RNA-Seq assemblies by guiding them with genomic sequences. RESULTS This study introduces BRANCH, an algorithm designed for improving de novo transcriptome assemblies by using genomic information that can be partial or complete genome sequences from the same or a related organism. Its input includes assembled RNA reads (transfrags), genomic sequences (e.g. contigs) and the RNA reads themselves. It uses a customized version of BLAT to align the transfrags and RNA reads to the genomic sequences. After identifying exons from the alignments, it defines a directed acyclic graph and maps the transfrags to paths on the graph. It then joins and extends the transfrags by applying an algorithm that solves a combinatorial optimization problem, called the Minimum weight Minimum Path Cover with given Paths. In performance tests on real data from Caenorhabditis elegans and Saccharomyces cerevisiae, assisted by genomic contigs from the same species, BRANCH improved the sensitivity and precision of transfrags generated by Velvet/Oases or Trinity by 5.1-56.7% and 0.3-10.5%, respectively. These improvements added 3.8-74.1% complete transcripts and 8.3-3.8% proteins to the initial assembly. Similar improvements were achieved when guiding the BRANCH processing of a transcriptome assembly from a more complex organism (mouse) with genomic sequences from a related species (rat). AVAILABILITY The BRANCH software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/branch. CONTACT thomas.girke@ucr.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ergude Bao
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | | | | |
Collapse
|
221
|
Aschoff M, Hotz-Wagenblatt A, Glatting KH, Fischer M, Eils R, König R. SplicingCompass: differential splicing detection using RNA-seq data. ACTA ACUST UNITED AC 2013; 29:1141-8. [PMID: 23449093 DOI: 10.1093/bioinformatics/btt101] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Alternative splicing is central for cellular processes and substantially increases transcriptome and proteome diversity. Aberrant splicing events often have pathological consequences and are associated with various diseases and cancer types. The emergence of next-generation RNA sequencing (RNA-seq) provides an exciting new technology to analyse alternative splicing on a large scale. However, algorithms that enable the analysis of alternative splicing from short-read sequencing are not fully established yet and there are still no standard solutions available for a variety of data analysis tasks. RESULTS We present a new method and software to predict genes that are differentially spliced between two different conditions using RNA-seq data. Our method uses geometric angles between the high dimensional vectors of exon read counts. With this, differential splicing can be detected even if the splicing events are composed of higher complexity and involve previously unknown splicing patterns. We applied our approach to two case studies including neuroblastoma tumour data with favourable and unfavourable clinical courses. We show the validity of our predictions as well as the applicability of our method in the context of patient clustering. We verified our predictions by several methods including simulated experiments and complementary in silico analyses. We found a significant number of exons with specific regulatory splicing factor motifs for predicted genes and a substantial number of publications linking those genes to alternative splicing. Furthermore, we could successfully exploit splicing information to cluster tissues and patients. Finally, we found additional evidence of splicing diversity for many predicted genes in normalized read coverage plots and in reads that span exon-exon junctions. AVAILABILITY SplicingCompass is licensed under the GNU GPL and freely available as a package in the statistical language R at http://www.ichip.de/software/SplicingCompass.html
Collapse
Affiliation(s)
- Moritz Aschoff
- Bioinformatics HUSAR, Genomics Proteomics Core Facility, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.
| | | | | | | | | | | |
Collapse
|
222
|
Hickey RD, Galivo F, Schug J, Brehm MA, Haft A, Wang Y, Benedetti E, Gu G, Magnuson MA, Shultz LD, Lagasse E, Greiner DL, Kaestner KH, Grompe M. Generation of islet-like cells from mouse gall bladder by direct ex vivo reprogramming. Stem Cell Res 2013; 11:503-15. [PMID: 23562832 DOI: 10.1016/j.scr.2013.02.005] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Revised: 02/01/2013] [Accepted: 02/09/2013] [Indexed: 01/19/2023] Open
Abstract
Cell replacement is an emerging therapy for type 1 diabetes. Pluripotent stem cells have received a lot of attention as a potential source of transplantable β-cells, but their ability to form teratomas poses significant risks. Here, we evaluated the potential of primary mouse gall bladder epithelial cells (GBCs) as targets for ex vivo genetic reprogramming to the β-cell fate. Conditions for robust expansion and genetic transduction of primary GBCs by adenoviral vectors were developed. Using a GFP reporter for insulin, conditions for reprogramming were then optimized. Global expression analysis by RNA-sequencing was used to quantitatively compare reprogrammed GBCs (rGBCs) to true β-cells, revealing both similarities and differences. Adenoviral-mediated expression of NEUROG3, Pdx1, and MafA in GBCs resulted in robust induction of pancreatic endocrine genes, including Ins1, Ins2, Neurod1, Nkx2-2 and Isl1. Furthermore, expression of GBC-specific genes was repressed, including Sox17 and Hes1. Reprogramming was also enhanced by addition of retinoic acid and inhibition of Notch signaling. Importantly, rGBCs were able to engraft long term in vivo and remained insulin-positive for 15weeks. We conclude that GBCs are a viable source for autologous cell replacement in diabetes, but that complete reprogramming will require further manipulations.
Collapse
Affiliation(s)
- Raymond D Hickey
- Oregon Stem Cell Center, Papé Family Pediatric Research Institute, Oregon Health & Science University, Portland, OR 97203, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
223
|
State-of-the-art fusion-finder algorithms sensitivity and specificity. BIOMED RESEARCH INTERNATIONAL 2013; 2013:340620. [PMID: 23555082 PMCID: PMC3595110 DOI: 10.1155/2013/340620] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Revised: 01/11/2013] [Accepted: 01/15/2013] [Indexed: 11/17/2022]
Abstract
Background. Gene fusions arising from chromosomal translocations have been implicated in cancer. RNA-seq has the potential to discover such rearrangements generating functional proteins (chimera/fusion). Recently, many methods for chimeras detection have been published. However, specificity and sensitivity of those tools were not extensively investigated in a comparative way. Results. We tested eight fusion-detection tools (FusionHunter, FusionMap, FusionFinder, MapSplice, deFuse, Bellerophontes, ChimeraScan, and TopHat-fusion) to detect fusion events using synthetic and real datasets encompassing chimeras. The comparison analysis run only on synthetic data could generate misleading results since we found no counterpart on real dataset. Furthermore, most tools report a very high number of false positive chimeras. In particular, the most sensitive tool, ChimeraScan, reports a large number of false positives that we were able to significantly reduce by devising and applying two filters to remove fusions not supported by fusion junction-spanning reads or encompassing large intronic regions. Conclusions. The discordant results obtained using synthetic and real datasets suggest that synthetic datasets encompassing fusion events may not fully catch the complexity of RNA-seq experiment. Moreover, fusion detection tools are still limited in sensitivity or specificity; thus, there is space for further improvement in the fusion-finder algorithms.
Collapse
|
224
|
Choi E, Kraus MRC, Lemaire LA, Yoshimoto M, Vemula S, Potter LA, Manduchi E, Stoeckert CJ, Grapin-Botton A, Magnuson MA. Dual lineage-specific expression of Sox17 during mouse embryogenesis. Stem Cells 2013; 30:2297-308. [PMID: 22865702 DOI: 10.1002/stem.1192] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Sox17 is essential for both endoderm development and fetal hematopoietic stem cell (HSC) maintenance. While endoderm-derived organs are well known to originate from Sox17-expressing cells, it is less certain whether fetal HSCs also originate from Sox17-expressing cells. By generating a Sox17(GFPCre) allele and using it to assess the fate of Sox17-expressing cells during embryogenesis, we confirmed that both endodermal and a part of definitive hematopoietic cells are derived from Sox17-positive cells. Prior to E9.5, the expression of Sox17 is restricted to the endoderm lineage. However, at E9.5 Sox17 is expressed in the endothelial cells (ECs) at the para-aortic splanchnopleural region that contribute to the formation of HSCs at a later stage. The identification of two distinct progenitor cell populations that express Sox17 at E9.5 was confirmed using fluorescence-activated cell sorting together with RNA-Seq to determine the gene expression profiles of the two cell populations. Interestingly, this analysis revealed differences in the RNA processing of the Sox17 mRNA during embryogenesis. Taken together, these results indicate that Sox17 is expressed in progenitor cells derived from two different germ layers, further demonstrating the complex expression pattern of this gene and suggesting caution when using Sox17 as a lineage-specific marker.
Collapse
Affiliation(s)
- Eunyoung Choi
- Center for Stem Cell Biology and Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee 37232-0494, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
225
|
Lindner R, Friedel CC. A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 2012; 7:e52403. [PMID: 23300661 PMCID: PMC3530550 DOI: 10.1371/journal.pone.0052403] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Accepted: 11/16/2012] [Indexed: 11/25/2022] Open
Abstract
Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.
Collapse
Affiliation(s)
- Robert Lindner
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, Heidelberg, Germany
| | - Caroline C. Friedel
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
226
|
Wang Q, Chikina M, Zaslavsky E, Pincas H, Sealfon SC. β-catenin regulates GnRH-induced FSHβ gene expression. Mol Endocrinol 2012; 27:224-37. [PMID: 23211523 DOI: 10.1210/me.2012-1310] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The regulation of gonadotropin synthesis by GnRH plays an essential role in the neuroendocrine control of reproduction. The known signaling mechanisms involved in gonadotropin synthesis have been expanding. For example, involvement of β-catenin in LHβ induction by GnRH has been discovered. We examined the role of β-catenin in FSHβ gene expression in LβT2 gonadotrope cells. GnRH caused a sustained increase in nuclear β-catenin levels, which was significantly reduced by c-Jun N-terminal kinase (JNK) inhibition. Small interfering RNA-mediated knockdown of β-catenin mRNA demonstrated that induction of FSHβ mRNA by GnRH depended on β-catenin and that regulation of FSHβ by β-catenin occurred independently of the JNK-c-jun pathway. β-Catenin depletion had no impact on FSHβ mRNA stability. In LβT2 cells transfected with FSHβ promoter luciferase fusion constructs, GnRH responsiveness was conferred by the proximal promoter (-944/-1) and was markedly decreased by β-catenin knockdown. However, none of the T-cell factor/lymphoid enhancer factor binding sites in that region were required for promoter activation by GnRH. Chromatin immunoprecipitation further corroborated the absence of direct interaction between β-catenin and the 1.8-kb FSHβ promoter. To elucidate the mechanism for the β-catenin effect, we analyzed approximately 1 billion reads of next-generation RNA sequencing β-catenin knockdown assays and selected the nuclear cofactor breast cancer metastasis-suppressor 1-like (Brms1L) as one candidate for further study. Subsequent experiments confirmed that Brms1L mRNA expression was decreased by β-catenin knockdown as well as by JNK inhibition. Furthermore, knockdown of Brms1L significantly attenuated GnRH-induced FSHβ expression. Thus, our findings indicate that the expression of Brms1L depends on β-catenin activity and contributes to FSHβ induction by GnRH.
Collapse
Affiliation(s)
- Qian Wang
- Department of Neurology, Center for Translational Systems Biology, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | | | | | | | |
Collapse
|
227
|
Phan JH, Wu PY, Wang MD. Improving the Flexibility of RNA-Seq Data Analysis Pipelines. IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS : [PROCEEDINGS]. IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS 2012; 2012:70-73. [PMID: 27536420 DOI: 10.1109/gensips.2012.6507729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.
Collapse
Affiliation(s)
- John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA,
| | - Po-Yen Wu
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA,
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA,
| |
Collapse
|
228
|
Churbanov A, Milligan B. Accurate diagnostics for Bovine tuberculosis based on high-throughput sequencing. PLoS One 2012; 7:e50147. [PMID: 23226242 PMCID: PMC3511461 DOI: 10.1371/journal.pone.0050147] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 10/22/2012] [Indexed: 01/18/2023] Open
Abstract
Background Bovine tuberculosis (bTB) is an enduring contagious disease of cattle that has caused substantial losses to the global livestock industry. Despite large-scale eradication efforts, bTB continues to persist. Current bTB tests rely on the measurement of immune responses in vivo (skin tests), and in vitro (bovine interferon-γ release assay). Recent developments are characterized by interrogating the expression of an increasing number of genes that participate in the immune response. Currently used assays have the disadvantages of limited sensitivity and specificity, which may lead to incomplete eradication of bTB. Moreover, bTB that reemerges from wild disease reservoirs requires early and reliable diagnostics to prevent further spread. In this work, we use high-throughput sequencing of the peripheral blood mononuclear cells (PBMCs) transcriptome to identify an extensive panel of genes that participate in the immune response. We also investigate the possibility of developing a reliable bTB classification framework based on RNA-Seq reads. Methodology/Principal Findings Pooled PBMC mRNA samples from unaffected calves as well as from those with disease progression of 1 and 2 months were sequenced using the Illumina Genome Analyzer II. More than 90 million reads were splice-aligned against the reference genome, and deposited to the database for further expression analysis and visualization. Using this database, we identified 2,312 genes that were differentially expressed in response to bTB infection (p<10−8). We achieved a bTB infected status classification accuracy of more than 99% with split-sample validation on newly designed and learned mixtures of expression profiles. Conclusions/Significance We demonstrated that bTB can be accurately diagnosed at the early stages of disease progression based on RNA-Seq high-throughput sequencing. The inclusion of multiple genes in the diagnostic panel, combined with the superior sensitivity and broader dynamic range of RNA-Seq, has the potential to improve the accuracy of bTB diagnostics. The computational pipeline used for the project is available from http://code.google.com/p/bovine-tb-prediction.
Collapse
MESH Headings
- Animals
- Cattle
- Gene Expression Profiling
- High-Throughput Nucleotide Sequencing
- Leukocytes, Mononuclear/cytology
- Leukocytes, Mononuclear/metabolism
- Leukocytes, Mononuclear/microbiology
- Male
- Mycobacterium bovis/immunology
- RNA, Messenger/genetics
- RNA, Messenger/immunology
- Sensitivity and Specificity
- Sequence Analysis, RNA/methods
- Transcriptome
- Tuberculosis, Bovine/diagnosis
- Tuberculosis, Bovine/genetics
- Tuberculosis, Bovine/immunology
- Tuberculosis, Bovine/microbiology
Collapse
Affiliation(s)
- Alexander Churbanov
- Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China.
| | | |
Collapse
|
229
|
Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Cade S, Doherty R, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Hu S, Iodice J, Kissinger JC, Kraemer ET, Li W, Pinney DF, Pitts B, Roos DS, Srinivasamoorthy G, Stoeckert CJ, Wang H, Warrenfeltz S. EuPathDB: the eukaryotic pathogen database. Nucleic Acids Res 2012; 41:D684-91. [PMID: 23175615 PMCID: PMC3531183 DOI: 10.1093/nar/gks1113] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.
Collapse
Affiliation(s)
- Cristina Aurrecoechea
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
230
|
Xuan J, Yu Y, Qing T, Guo L, Shi L. Next-generation sequencing in the clinic: promises and challenges. Cancer Lett 2012; 340:284-95. [PMID: 23174106 DOI: 10.1016/j.canlet.2012.11.025] [Citation(s) in RCA: 199] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/13/2012] [Accepted: 11/13/2012] [Indexed: 02/06/2023]
Abstract
The advent of next generation sequencing (NGS) technologies has revolutionized the field of genomics, enabling fast and cost-effective generation of genome-scale sequence data with exquisite resolution and accuracy. Over the past years, rapid technological advances led by academic institutions and companies have continued to broaden NGS applications from research to the clinic. A recent crop of discoveries have highlighted the medical impact of NGS technologies on Mendelian and complex diseases, particularly cancer. However, the ever-increasing pace of NGS adoption presents enormous challenges in terms of data processing, storage, management and interpretation as well as sequencing quality control, which hinder the translation from sequence data into clinical practice. In this review, we first summarize the technical characteristics and performance of current NGS platforms. We further highlight advances in the applications of NGS technologies towards the development of clinical diagnostics and therapeutics. Common issues in NGS workflows are also discussed to guide the selection of NGS platforms and pipelines for specific research purposes.
Collapse
Affiliation(s)
- Jiekun Xuan
- School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China; National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | | | | | | | | |
Collapse
|
231
|
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. ACTA ACUST UNITED AC 2012; 29:15-21. [PMID: 23104886 DOI: 10.1093/bioinformatics/bts635] [Citation(s) in RCA: 32095] [Impact Index Per Article: 2468.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
MOTIVATION Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Collapse
|
232
|
Fonseca NA, Rung J, Brazma A, Marioni JC. Tools for mapping high-throughput sequencing data. Bioinformatics 2012; 28:3169-77. [DOI: 10.1093/bioinformatics/bts605] [Citation(s) in RCA: 207] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
233
|
Palmieri N, Nolte V, Suvorov A, Kosiol C, Schlötterer C. Evaluation of different reference based annotation strategies using RNA-Seq - a case study in Drososphila pseudoobscura. PLoS One 2012; 7:e46415. [PMID: 23056304 PMCID: PMC3463616 DOI: 10.1371/journal.pone.0046415] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 08/30/2012] [Indexed: 11/20/2022] Open
Abstract
RNA-Seq is a powerful tool for the annotation of genomes, in particular for the identification of isoforms and UTRs. Nevertheless, several software tools exist and no standard strategy to obtain a reliable annotation is yet established. We tested different combinations of the most commonly used reference-based alignment tools (TopHat, GSNAP) in combination with two frequently used reference-based assemblers (Cufflinks, Scripture) and evaluated the potential of RNA-Seq to improve the annotation of Drosophila pseudoobscura. While GSNAP maps a higher proportion of reads, TopHat resulted in a more accurate annotation when used in combination with Cufflinks. Scripture had the lowest sensitivity. Interestingly, after subsampling to the same coverage for GSNAP and TopHat, we find that both mappers have similar performance, implying that the advantage of TopHat is mainly an artifact of the lower coverage. Overall, we observed a low concordance among the different approaches tested both at junction and isoform levels. Using data from both sexes of two adult strains of D. pseudoobscura we detected alternative splicing for about 30% of the FlyBase multiple-exon genes. Moreover, we extended the boundaries for 6523 genes (about 40%). We annotated 669 new genes, 45% of them with splicing evidence. Most of the new genes are located on unassembled contigs, reflecting their incomplete annotation. Finally, we identified 99 additional new genes that are not represented in the current genome contigs of D. pseudoobscura, probably due to location in genomic regions that are difficult to assemble (e.g. heterochromatic regions).
Collapse
Affiliation(s)
- Nicola Palmieri
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.
| | | | | | | | | |
Collapse
|
234
|
Mpn1, mutated in poikiloderma with neutropenia protein 1, is a conserved 3'-to-5' RNA exonuclease processing U6 small nuclear RNA. Cell Rep 2012; 2:855-65. [PMID: 23022480 DOI: 10.1016/j.celrep.2012.08.031] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 08/31/2012] [Accepted: 08/31/2012] [Indexed: 01/09/2023] Open
Abstract
Clericuzio-type poikiloderma with neutropenia (PN) is a rare genodermatosis associated with mutations in the C16orf57 gene, which codes for the uncharacterized protein hMpn1. We show here that, in both fission yeasts and humans, Mpn1 processes the spliceosomal U6 small nuclear RNA (snRNA) posttranscriptionally. In Mpn1-deficient cells, U6 molecules carry 3' end polyuridine tails that are longer than those in normal cells and lack a terminal 2',3' cyclic phosphate group. In mpn1Δ yeast cells, U6 snRNA and U4/U6 di-small nuclear RNA protein complex levels are diminished, leading to precursor messenger RNA splicing defects, which are reverted by expression of either yeast or human Mpn1 and by overexpression of U6. Recombinant hMpn1 is a 3'-to-5' RNA exonuclease that removes uridines from U6 3' ends, generating terminal 2',3' cyclic phosphates in vitro. Finally, U6 degradation rates increase in mpn1Δ yeasts and in lymphoblasts established from individuals affected by PN. Our data indicate that Mpn1 promotes U6 stability through 3' end posttranscriptional processing and implicate altered U6 metabolism as a potential mechanism for PN pathogenesis.
Collapse
|
235
|
Nookaew I, Papini M, Pornputtapong N, Scalcinati G, Fagerberg L, Uhlén M, Nielsen J. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res 2012; 40:10084-97. [PMID: 22965124 PMCID: PMC3488244 DOI: 10.1093/nar/gks804] [Citation(s) in RCA: 215] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data.
Collapse
Affiliation(s)
- Intawat Nookaew
- Novo Nordisk Foundation Center for Biosustainability, Department of Chemical and Biological Engineering, Chalmers University of Technology, SE-41296, Gothenburg, Sweden
| | | | | | | | | | | | | |
Collapse
|
236
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|
237
|
Abstract
Leber congenital amaurosis (LCA) is an infantile-onset form of inherited retinal degeneration characterized by severe vision loss1, 2. Two-thirds of LCA cases are caused by mutations in 17 known disease genes3 (RetNet Retinal Information Network). Using exome sequencing, we identified a homozygous missense mutation (c.25G>A, p.Val9Met) in NMNAT1 as likely disease-causing in two siblings of a consanguineous Pakistani kindred affected by LCA. This mutation segregated with disease in their kindred, including in three other children with LCA. NMNAT1 resides in the previously identified LCA9 locus and encodes the nuclear isoform of nicotinamide mononucleotide adenylyltransferase, a rate-limiting enzyme in nicotinamide adenine dinucleotide (NAD+) biosynthesis4, 5. Functional studies showed the p.Val9Met mutation decreased NMNAT1 enzyme activity. Sequencing NMNAT1 in 284 unrelated LCA families identified 14 rare mutations in 13 additional affected individuals. These results are the first to link an NMNAT isoform to disease and indicate that NMNAT1 mutations cause LCA.
Collapse
|
238
|
Devonshire AS, Sanders R, Wilkes TM, Taylor MS, Foy CA, Huggett JF. Application of next generation qPCR and sequencing platforms to mRNA biomarker analysis. Methods 2012; 59:89-100. [PMID: 22841564 DOI: 10.1016/j.ymeth.2012.07.021] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Revised: 06/26/2012] [Accepted: 07/16/2012] [Indexed: 12/26/2022] Open
Abstract
Recent years have seen the emergence of new high-throughput PCR and sequencing platforms with the potential to bring analysis of transcriptional biomarkers to a broader range of clinical applications and to provide increasing depth to our understanding of the transcriptome. We present an overview of how to process clinical samples for RNA biomarker analysis in terms of RNA extraction and mRNA enrichment, and guidelines for sample analysis by RT-qPCR and digital PCR using nanofluidic real-time PCR platforms. The options for quantitative gene expression profiling and whole transcriptome sequencing by next generation sequencing are reviewed alongside the bioinformatic considerations for these approaches. Considering the diverse technologies now available for transcriptome analysis, methods for standardising measurements between platforms will be paramount if their diagnostic impact is to be maximised. Therefore, the use of RNA standards and other reference materials is also discussed.
Collapse
Affiliation(s)
- Alison S Devonshire
- Molecular and Cell Biology, LGC Limited, Queens Road, Teddington, Middlesex TW11 0LY, UK
| | | | | | | | | | | |
Collapse
|
239
|
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012; 5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.
Collapse
|
240
|
Chen L. Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing. STATISTICS IN BIOSCIENCES 2012; 5:138-155. [PMID: 24058384 DOI: 10.1007/s12561-012-9064-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The burgeoning field of high-throughput sequencing significantly improves our ability to understand the complexity of transcriptomes. Alternative splicing, as one of the most important driving forces for transcriptome diversity, can now be studied at an unprecedent resolution. Efficient and powerful computational and statistical methods are in urgent need to facilitate the characterization and quantification of alternative splicing events. Here we discuss methods in splice junction read mapping, and methods in exon-centric or isoform-centric quantification of alternative splicing. In addition, we discuss HITS-CLIP and splicing QTL analyses which are novel high-throughput sequencing based approaches in the dissection of splicing regulation.
Collapse
Affiliation(s)
- Liang Chen
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
241
|
Abstract
UNLABELLED Accurately mapping RNA-Seq reads to the reference genome is a critical step for performing downstream analysis such as transcript assembly, isoform detection and quantification. Many tools have been developed; however, given the huge size of the next generation sequencing datasets and the complexity of the transcriptome, RNA-Seq read mapping remains a challenge with the ever-increasing amount of data. We develop Omicsoft sequence aligner (OSA), a fast and accurate alignment tool for RNA-Seq data. Benchmarked with existing methods, OSA improves mapping speed 4-10-fold with better sensitivity and less false positives. AVAILABILITY OSA can be downloaded from http://omicsoft.com/osa. It is free to academic users. OSA has been tested extensively on Linux, Mac OS X and Windows platforms.
Collapse
Affiliation(s)
- Jun Hu
- Division of Bioinformatics, Omicsoft Inc., 164 Quade Drive, Cary, NC 27513, USA.
| | | | | | | |
Collapse
|
242
|
Bonfert T, Csaba G, Zimmer R, Friedel CC. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinformatics 2012; 13 Suppl 6:S9. [PMID: 22537048 PMCID: PMC3358662 DOI: 10.1186/1471-2105-13-s6-s9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Sequencing of mRNA (RNA-seq) by next generation sequencing technologies is widely used for analyzing the transcriptomic state of a cell. Here, one of the main challenges is the mapping of a sequenced read to its transcriptomic origin. As a simple alignment to the genome will fail to identify reads crossing splice junctions and a transcriptome alignment will miss novel splice sites, several approaches have been developed for this purpose. Most of these approaches have two drawbacks. First, each read is assigned to a location independent on whether the corresponding gene is expressed or not, i.e. information from other reads is not taken into account. Second, in case of multiple possible mappings, the mapping with the fewest mismatches is usually chosen which may lead to wrong assignments due to sequencing errors. Results To address these problems, we developed ContextMap which efficiently uses information on the context of a read, i.e. reads mapping to the same expressed region. The context information is used to resolve possible ambiguities and, thus, a much larger degree of ambiguities can be allowed in the initial stage in order to detect all possible candidate positions. Although ContextMap can be used as a stand-alone version using either a genome or transcriptome as input, the version presented in this article is focused on refining initial mappings provided by other mapping algorithms. Evaluation results on simulated sequencing reads showed that the application of ContextMap to either TopHat or MapSplice mappings improved the mapping accuracy of both initial mappings considerably. Conclusions In this article, we show that the context of reads mapping to nearby locations provides valuable information for identifying the best unique mapping for a read. Using our method, mappings provided by other state-of-the-art methods can be refined and alignment accuracy can be further improved. Availability http://www.bio.ifi.lmu.de/ContextMap.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig-Maximilians-University Munich, Amalienstr, 17, 80333 Munich, Germany
| | | | | | | |
Collapse
|
243
|
Hughes ME, Grant GR, Paquin C, Qian J, Nitabach MN. Deep sequencing the circadian and diurnal transcriptome of Drosophila brain. Genome Res 2012; 22:1266-81. [PMID: 22472103 PMCID: PMC3396368 DOI: 10.1101/gr.128876.111] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Eukaryotic circadian clocks include transcriptional/translational feedback loops that drive 24-h rhythms of transcription. These transcriptional rhythms underlie oscillations of protein abundance, thereby mediating circadian rhythms of behavior, physiology, and metabolism. Numerous studies over the last decade have used microarrays to profile circadian transcriptional rhythms in various organisms and tissues. Here we use RNA sequencing (RNA-seq) to profile the circadian transcriptome of Drosophila melanogaster brain from wild-type and period-null clock-defective animals. We identify several hundred transcripts whose abundance oscillates with 24-h periods in either constant darkness or 12 h light/dark diurnal cycles, including several noncoding RNAs (ncRNAs) that were not identified in previous microarray studies. Of particular interest are U snoRNA host genes (Uhgs), a family of diurnal cycling noncoding RNAs that encode the precursors of more than 50 box-C/D small nucleolar RNAs, key regulators of ribosomal biogenesis. Transcriptional profiling at the level of individual exons reveals alternative splice isoforms for many genes whose relative abundances are regulated by either period or circadian time, although the effect of circadian time is muted in comparison to that of period. Interestingly, period loss of function significantly alters the frequency of RNA editing at several editing sites, suggesting an unexpected link between a key circadian gene and RNA editing. We also identify tens of thousands of novel splicing events beyond those previously annotated by the modENCODE Consortium, including several that affect key circadian genes. These studies demonstrate extensive circadian control of ncRNA expression, reveal the extent of clock control of alternative splicing and RNA editing, and provide a novel, genome-wide map of splicing in Drosophila brain.
Collapse
Affiliation(s)
- Michael E Hughes
- Department of Cellular and Molecular Physiology, and Program in Cellular Neuroscience, Neurodegeneration and Repair, Yale School of Medicine, New Haven, Connecticut 06520, USA
| | | | | | | | | |
Collapse
|
244
|
Affiliation(s)
- Li Cai
- Department of Biomedical Engineering, Rutgers University, Piscataway, USA
| | | |
Collapse
|
245
|
Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 2011; 13:36-46. [PMID: 22124482 DOI: 10.1038/nrg3117] [Citation(s) in RCA: 1122] [Impact Index Per Article: 80.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Collapse
|