1
|
Anatskaya OV, Runov AL, Ponomartsev SV, Vonsky MS, Elmuratov AU, Vinogradov AE. Long-Term Transcriptomic Changes and Cardiomyocyte Hyperpolyploidy after Lactose Intolerance in Neonatal Rats. Int J Mol Sci 2023; 24:7063. [PMID: 37108224 PMCID: PMC10138443 DOI: 10.3390/ijms24087063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/02/2023] [Accepted: 04/08/2023] [Indexed: 04/29/2023] Open
Abstract
Many cardiovascular diseases originate from growth retardation, inflammation, and malnutrition during early postnatal development. The nature of this phenomenon is not completely understood. Here we aimed to verify the hypothesis that systemic inflammation triggered by neonatal lactose intolerance (NLI) may exert long-term pathologic effects on cardiac developmental programs and cardiomyocyte transcriptome regulation. Using the rat model of NLI triggered by lactase overloading with lactose and the methods of cytophotometry, image analysis, and mRNA-seq, we evaluated cardiomyocyte ploidy, signs of DNA damage, and NLI-associated long-term transcriptomic changes of genes and gene modules that differed qualitatively (i.e., were switched on or switched off) in the experiment vs. the control. Our data indicated that NLI triggers the long-term animal growth retardation, cardiomyocyte hyperpolyploidy, and extensive transcriptomic rearrangements. Many of these rearrangements are known as manifestations of heart pathologies, including DNA and telomere instability, inflammation, fibrosis, and reactivation of fetal gene program. Moreover, bioinformatic analysis identified possible causes of these pathologic traits, including the impaired signaling via thyroid hormone, calcium, and glutathione. We also found transcriptomic manifestations of increased cardiomyocyte polyploidy, such as the induction of gene modules related to open chromatin, e.g., "negative regulation of chromosome organization", "transcription" and "ribosome biogenesis". These findings suggest that ploidy-related epigenetic alterations acquired in the neonatal period permanently rewire gene regulatory networks and alter cardiomyocyte transcriptome. Here we provided first evidence indicating that NLI can be an important trigger of developmental programming of adult cardiovascular disease. The obtained results can help to develop preventive strategies for reducing the NLI-associated adverse effects of inflammation on the developing cardiovascular system.
Collapse
Affiliation(s)
| | - Andrey L. Runov
- The D.I. Mendeleev All-Russian Institute for Metrology (VNIIM), Moskovsky ave 19, Saint Petersburg 190005, Russia
- Almazov Medical Research Centre, Akkuratova Street 2, Saint Petersburg 197341, Russia
| | | | - Maxim S. Vonsky
- The D.I. Mendeleev All-Russian Institute for Metrology (VNIIM), Moskovsky ave 19, Saint Petersburg 190005, Russia
- Almazov Medical Research Centre, Akkuratova Street 2, Saint Petersburg 197341, Russia
| | - Artem U. Elmuratov
- Medical Genetics Centre Genotek, Nastavnichesky Alley 17-1-15, Moscow 105120, Russia
| | | |
Collapse
|
2
|
Wei ZG, Fan XG, Zhang H, Zhang XD, Liu F, Qian Y, Zhang SW. kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph. Front Genet 2022; 13:890651. [PMID: 35601495 PMCID: PMC9117619 DOI: 10.3389/fgene.2022.890651] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 04/07/2022] [Indexed: 11/13/2022] Open
Abstract
With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Xing-Guo Fan
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Hao Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- *Correspondence: Yu Qian, ; Shao-Wu Zhang,
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Yu Qian, ; Shao-Wu Zhang,
| |
Collapse
|
3
|
New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies. Neural Comput Appl 2021; 33:15669-15692. [PMID: 34155424 PMCID: PMC8208613 DOI: 10.1007/s00521-021-06188-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 06/02/2021] [Indexed: 12/13/2022]
Abstract
During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformatics researchers have been challenged to implement alignment algorithms for next-generation sequencing reads. However, nowadays selection of aligners based on genome characteristics is poorly studied, so our benchmarking study extended the “state of art” comparing 17 different aligners. The chosen tools were assessed on empirical human DNA- and RNA-Seq data, as well as on simulated datasets in human and mouse, evaluating a set of parameters previously not considered in such kind of benchmarks. As expected, we found that each tool was the best in specific conditions. For Ion Torrent single-end RNA-Seq samples, the most suitable aligners were CLC and BWA-MEM, which reached the best results in terms of efficiency, accuracy, duplication rate, saturation profile and running time. About Illumina paired-end osteomyelitis transcriptomics data, instead, the best performer algorithm, together with the already cited CLC, resulted Novoalign, which excelled in accuracy and saturation analyses. Segemehl and DNASTAR performed the best on both DNA-Seq data, with Segemehl particularly suitable for exome data. In conclusion, our study could guide users in the selection of a suitable aligner based on genome and transcriptome characteristics. However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes.
Collapse
|
4
|
Koshkin SA, Anatskaya OV, Vinogradov AE, Uversky VN, Dayhoff GW, Bystriakova MA, Pospelov VA, Tolkunova EN. Isolation and Characterization of Human Colon Adenocarcinoma Stem-Like Cells Based on the Endogenous Expression of the Stem Markers. Int J Mol Sci 2021; 22:4682. [PMID: 33925224 PMCID: PMC8124683 DOI: 10.3390/ijms22094682] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/26/2021] [Accepted: 04/26/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Cancer stem cells' (CSCs) self-maintenance is regulated via the pluripotency pathways promoting the most aggressive tumor phenotype. This study aimed to use the activity of these pathways for the CSCs' subpopulation enrichment and separating cells characterized by the OCT4 and SOX2 expression. METHODS To select and analyze CSCs, we used the SORE6x lentiviral reporter plasmid for viral transduction of colon adenocarcinoma cells. Additionally, we assessed cell chemoresistance, clonogenic, invasive and migratory activity and the data of mRNA-seq and intrinsic disorder predisposition protein analysis (IDPPA). RESULTS We obtained the line of CSC-like cells selected on the basis of the expression of the OCT4 and SOX2 stem cell factors. The enriched CSC-like subpopulation had increased chemoresistance as well as clonogenic and migration activities. The bioinformatic analysis of mRNA seq data identified the up-regulation of pluripotency, development, drug resistance and phototransduction pathways, and the downregulation of pathways related to proliferation, cell cycle, aging, and differentiation. IDPPA indicated that CSC-like cells are predisposed to increased intrinsic protein disorder. CONCLUSION The use of the SORE6x reporter construct for CSCs enrichment allows us to obtain CSC-like population that can be used as a model to search for the new prognostic factors and potential therapeutic targets for colon cancer treatment.
Collapse
Affiliation(s)
- Sergei A. Koshkin
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
- Department of Medical Oncology, Sidney Kimmel Cancer Center, Thomas Jefferson University, 1015 Walnut Street, Ste. 1024, Philadelphia, PA 19107, USA
| | - Olga V. Anatskaya
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
| | - Alexander E. Vinogradov
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Guy W. Dayhoff
- Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, FL 33620, USA;
| | - Margarita A. Bystriakova
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
| | - Valery A. Pospelov
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
| | - Elena N. Tolkunova
- Institute of Cytology of the Russian Academy of Science, 194064 St-Petersburg, Russia; (M.A.B.); (V.A.P.)
| |
Collapse
|
5
|
Next Generation Sequencing Technology in the Clinic and Its Challenges. Cancers (Basel) 2021; 13:cancers13081751. [PMID: 33916923 PMCID: PMC8067551 DOI: 10.3390/cancers13081751] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 03/30/2021] [Accepted: 04/05/2021] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Precise identification and annotation of mutations are of utmost importance in clinical oncology. Insights of the DNA sequence can provide meaningful knowledge to unravel the underlying genetics of disease. Hence, tailoring of personalized medicine often relies on specific genomic alteration for treatment efficacy. The aim of this review is to highlight that sequencing harbors much more than just four nucleotides. Moreover, the gradual transition from first to second generation sequencing technologies has led to awareness for choosing the most appropriate bioinformatic analytic tools based on the aim, quality and demand for a specific purpose. Thus, the same raw data can lead to various results reflecting the intrinsic features of different datamining pipelines. Abstract Data analysis has become a crucial aspect in clinical oncology to interpret output from next-generation sequencing-based testing. NGS being able to resolve billions of sequencing reactions in a few days has consequently increased the demand for tools to handle and analyze such large data sets. Many tools have been developed since the advent of NGS, featuring their own peculiarities. Increased awareness when interpreting alterations in the genome is therefore of utmost importance, as the same data using different tools can provide diverse outcomes. Hence, it is crucial to evaluate and validate bioinformatic pipelines in clinical settings. Moreover, personalized medicine implies treatment targeting efficacy of biological drugs for specific genomic alterations. Here, we focused on different sequencing technologies, features underlying the genome complexity, and bioinformatic tools that can impact the final annotation. Additionally, we discuss the clinical demand and design for implementing NGS.
Collapse
|
6
|
Diallo I, Ho J, Laffont B, Laugier J, Benmoussa A, Lambert M, Husseini Z, Soule G, Kozak R, Kobinger GP, Provost P. Altered microRNA Transcriptome in Cultured Human Liver Cells upon Infection with Ebola Virus. Int J Mol Sci 2021; 22:ijms22073792. [PMID: 33917562 PMCID: PMC8038836 DOI: 10.3390/ijms22073792] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/27/2021] [Accepted: 03/30/2021] [Indexed: 02/07/2023] Open
Abstract
Ebola virus (EBOV) is a virulent pathogen, notorious for inducing life-threatening hemorrhagic fever, that has been responsible for several outbreaks in Africa and remains a public health threat. Yet, its pathogenesis is still not completely understood. Although there have been numerous studies on host transcriptional response to EBOV, with an emphasis on the clinical features, the impact of EBOV infection on post-transcriptional regulatory elements, such as microRNAs (miRNAs), remains largely unexplored. MiRNAs are involved in inflammation and immunity and are believed to be important modulators of the host response to viral infection. Here, we have used small RNA sequencing (sRNA-Seq), qPCR and functional analyses to obtain the first comparative miRNA transcriptome (miRNome) of a human liver cell line (Huh7) infected with one of the following three EBOV strains: Mayinga (responsible for the first Zaire outbreak in 1976), Makona (responsible for the West Africa outbreak in 2013–2016) and the epizootic Reston (presumably innocuous to humans). Our results highlight specific miRNA-based immunity pathways and substantial differences between the strains beyond their clinical manifestation and pathogenicity. These analyses shed new light into the molecular signature of liver cells upon EBOV infection and reveal new insights into miRNA-based virus attack and host defense strategy.
Collapse
Affiliation(s)
- Idrissa Diallo
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Jeffrey Ho
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Benoit Laffont
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Jonathan Laugier
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Abderrahim Benmoussa
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Marine Lambert
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Zeinab Husseini
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Geoff Soule
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
| | - Robert Kozak
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
- Division of Microbiology, Department of Laboratory Medicine & Molecular Diagnostics, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada
| | - Gary P. Kobinger
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
- Département de Microbiologie Médicale, Université du Manitoba, Winnipeg, MB R3E 0J9, Canada
| | - Patrick Provost
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
- CHUQ Research Center/CHUL Pavilion, 2705 Blvd Laurier, Room T1-65, Quebec, QC G1V 4G2, Canada
- Correspondence: ; Tel.: +1-418-525-4444 (ext. 48842)
| |
Collapse
|
7
|
Galise TR, Esposito S, D'Agostino N. Guidelines for Setting Up a mRNA Sequencing Experiment and Best Practices for Bioinformatic Data Analysis. Methods Mol Biol 2021; 2264:137-162. [PMID: 33263908 DOI: 10.1007/978-1-0716-1201-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA-sequencing, commonly referred to as RNA-seq, is the most recently developed method for the analysis of transcriptomes. It uses high-throughput next-generation sequencing technologies and has revolutionized our understanding of the complexity and dynamics of whole transcriptomes.In this chapter, we recall the key developments in transcriptome analysis and dissect the different steps of the general workflow that can be run by users to design and perform a mRNA-seq experiment as well as to process mRNA-seq data obtained by the Illumina technology. The chapter proposes guidelines for completing a mRNA-seq study properly and makes available recommendations for best practices based on recent literature and on the latest developments in technology and algorithms. We also remark the large number of choices available (especially for bioinformatic data analysis) in front of which the scientist may be in trouble.In the last part of the chapter we discuss the new frontiers of single-cell RNA-seq and isoform sequencing by long read technology.
Collapse
Affiliation(s)
- Teresa Rosa Galise
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy
| | - Salvatore Esposito
- CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy
| | - Nunzio D'Agostino
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy.
| |
Collapse
|
8
|
Kanzi AM, San JE, Chimukangara B, Wilkinson E, Fish M, Ramsuran V, de Oliveira T. Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance. Front Genet 2020; 11:544162. [PMID: 33193618 PMCID: PMC7649788 DOI: 10.3389/fgene.2020.544162] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/21/2020] [Indexed: 12/29/2022] Open
Abstract
Mendelian and complex genetic trait diseases continue to burden and affect society both socially and economically. The lack of effective tests has hampered diagnosis thus, the affected lack proper prognosis. Mendelian diseases are caused by genetic mutations in a singular gene while complex trait diseases are caused by the accumulation of mutations in either linked or unlinked genomic regions. Significant advances have been made in identifying novel diseases associated mutations especially with the introduction of next generation and third generation sequencing. Regardless, some diseases are still without diagnosis as most tests rely on SNP genotyping panels developed from population based genetic analyses. Analysis of family genetic inheritance using whole genomes, whole exomes or a panel of genes has been shown to be effective in identifying disease-causing mutations. In this review, we discuss next generation and third generation sequencing platforms, bioinformatic tools and genetic resources commonly used to analyze family based genomic data with a focus on identifying inherited or novel disease-causing mutations. Additionally, we also highlight the analytical, ethical and regulatory challenges associated with analyzing personal genomes which constitute the data used for family genetic inheritance.
Collapse
Affiliation(s)
- Aquillah M. Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | | | | | | | | | | | | |
Collapse
|
9
|
Tong L, Wu PY, Phan JH, Hassazadeh HR, Tong W, Wang MD. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci Rep 2020; 10:17925. [PMID: 33087762 PMCID: PMC7578822 DOI: 10.1038/s41598-020-74567-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/27/2020] [Indexed: 11/23/2022] Open
Abstract
To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline's performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.
Collapse
Affiliation(s)
- Li Tong
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Po-Yen Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Hamid R Hassazadeh
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
| |
Collapse
|
10
|
Mahmood K, Orabi J, Kristensen PS, Sarup P, Jørgensen LN, Jahoor A. De novo transcriptome assembly, functional annotation, and expression profiling of rye (Secale cereale L.) hybrids inoculated with ergot (Claviceps purpurea). Sci Rep 2020; 10:13475. [PMID: 32778722 PMCID: PMC7417550 DOI: 10.1038/s41598-020-70406-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 07/24/2020] [Indexed: 12/22/2022] Open
Abstract
Rye is used as food, feed, and for bioenergy production and remain an essential grain crop for cool temperate zones in marginal soils. Ergot is known to cause severe problems in cross-pollinated rye by contamination of harvested grains. The molecular response of the underlying mechanisms of this disease is still poorly understood due to the complex infection pattern. RNA sequencing can provide astonishing details about the transcriptional landscape, hence we employed a transcriptomic approach to identify genes in the underlying mechanism of ergot infection in rye. In this study, we generated de novo assemblies from twelve biological samples of two rye hybrids with identified contrasting phenotypic responses to ergot infection. The final transcriptome of ergot susceptible (DH372) and moderately ergot resistant (Helltop) hybrids contain 208,690 and 192,116 contigs, respectively. By applying the BUSCO pipeline, we confirmed that these transcriptome assemblies contain more than 90% of gene representation of the available orthologue groups at Virdiplantae odb10. We employed a de novo assembled and the draft reference genome of rye to count the differentially expressed genes (DEGs) between the two hybrids with and without inoculation. The gene expression comparisons revealed that 228 genes were linked to ergot infection in both hybrids. The genome ontology enrichment analysis of DEGs associated them with metabolic processes, hydrolase activity, pectinesterase activity, cell wall modification, pollen development and pollen wall assembly. In addition, gene set enrichment analysis of DEGs linked them to cell wall modification and pectinesterase activity. These results suggest that a combination of different pathways, particularly cell wall modification and pectinesterase activity contribute to the underlying mechanism that might lead to resistance against ergot in rye. Our results may pave the way to select genetic material to improve resistance against ergot through better understanding of the mechanism of ergot infection at molecular level. Furthermore, the sequence data and de novo assemblies are valuable as scientific resources for future studies in rye.
Collapse
Affiliation(s)
- Khalid Mahmood
- Nordic Seed A/S, Grindsnabevej 25, 8300, Odder, Denmark. .,Department of Agroecology, Faculty of Science and Technology, Aarhus University, Forsøgsvej 1, Flakkebjerg, 4200, Slagelse, Denmark.
| | - Jihad Orabi
- Nordic Seed A/S, Grindsnabevej 25, 8300, Odder, Denmark
| | | | | | - Lise Nistrup Jørgensen
- Department of Agroecology, Faculty of Science and Technology, Aarhus University, Forsøgsvej 1, Flakkebjerg, 4200, Slagelse, Denmark
| | - Ahmed Jahoor
- Nordic Seed A/S, Grindsnabevej 25, 8300, Odder, Denmark.,Department of Plant Breeding, The Swedish University of Agricultural Sciences, 23053, Alnarp, Sweden
| |
Collapse
|
11
|
Transcriptomic Data Sets To Determine Gene Expression Changes Mediated by the Presence of PBT2 in Growth Medium of Multidrug-Resistant Neisseria gonorrhoeae WHO Z. Microbiol Resour Announc 2020; 9:9/21/e00283-20. [PMID: 32439664 PMCID: PMC7242666 DOI: 10.1128/mra.00283-20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Neisseria gonorrhoeae causes the sexually transmitted infection gonorrhea. High-coverage (∼3,300-fold) transcriptome sequencing data have been collected from multidrug-resistant N. gonorrhoeae strain WHO Z grown in the presence and absence of PBT2. Neisseria gonorrhoeae causes the sexually transmitted infection gonorrhea. High-coverage (∼3,300-fold) transcriptome sequencing data have been collected from multidrug-resistant N. gonorrhoeae strain WHO Z grown in the presence and absence of PBT2.
Collapse
|
12
|
Morgulis A, Agarwala R. SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees. Gigascience 2020; 9:giaa023. [PMID: 32315028 PMCID: PMC7172022 DOI: 10.1093/gigascience/giaa023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 08/15/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Alignment of sequence reads generated by next-generation sequencing is an integral part of most pipelines analyzing next-generation sequencing data. A number of tools designed to quickly align a large volume of sequences are already available. However, most existing tools lack explicit guarantees about their output. They also do not support searching genome assemblies, such as the human genome assembly GRCh38, that include primary and alternate sequences and placement information for alternate sequences to primary sequences in the assembly. FINDINGS This paper describes SRPRISM (Single Read Paired Read Indel Substitution Minimizer), an alignment tool for aligning reads without splices. SRPRISM has features not available in most tools, such as (i) support for searching genome assemblies with alternate sequences, (ii) partial alignment of reads with a specified region of reads to be included in the alignment, (iii) choice of ranking schemes for alignments, and (iv) explicit criteria for search sensitivity. We compare the performance of SRPRISM to GEM, Kart, STAR, BWA-MEM, Bowtie2, Hobbes, and Yara using benchmark sets for paired and single reads of lengths 100 and 250 bp generated using DWGSIM. SRPRISM found the best results for most benchmark sets with error rate of up to ∼2.5% and GEM performed best for higher error rates. SRPRISM was also more sensitive than other tools even when sensitivity was reduced to improve run time performance. CONCLUSIONS We present SRPRISM as a flexible read mapping tool that provides explicit guarantees on results.
Collapse
Affiliation(s)
- Aleksandr Morgulis
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike Bethesda, MD 20894, USA
| | - Richa Agarwala
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike Bethesda, MD 20894, USA
| |
Collapse
|
13
|
de Dios R, Rivas-Marin E, Santero E, Reyes-Ramírez F. Two paralogous EcfG σ factors hierarchically orchestrate the activation of the General Stress Response in Sphingopyxis granuli TFA. Sci Rep 2020; 10:5177. [PMID: 32198475 PMCID: PMC7083833 DOI: 10.1038/s41598-020-62101-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 02/20/2020] [Indexed: 12/15/2022] Open
Abstract
Under ever-changing environmental conditions, the General Stress Response (GSR) represents a lifesaver for bacteria in order to withstand hostile situations. In α-proteobacteria, the EcfG-type extracytoplasmic function (ECF) σ factors are the key activators of this response at the transcriptional level. In this work, we address the hierarchical function of the ECF σ factor paralogs EcfG1 and EcfG2 in triggering the GSR in Sphingopyxis granuli TFA and describe the role of EcfG2 as global switch of this response. In addition, we define a GSR regulon for TFA and use in vitro transcription analysis to study the relative contribution of each EcfG paralog to the expression of selected genes. We show that the features of each promoter ultimately dictate this contribution, though EcfG2 always produced more transcripts than EcfG1 regardless of the promoter. These first steps in the characterisation of the GSR in TFA suggest a tight regulation to orchestrate an adequate protective response in order to survive in conditions otherwise lethal.
Collapse
Affiliation(s)
- Rubén de Dios
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas/Junta de Andalucía. Departamento de Biología Molecular e Ingeniería Bioquímica, Seville, Spain
| | - Elena Rivas-Marin
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas/Junta de Andalucía. Departamento de Biología Molecular e Ingeniería Bioquímica, Seville, Spain
| | - Eduardo Santero
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas/Junta de Andalucía. Departamento de Biología Molecular e Ingeniería Bioquímica, Seville, Spain
| | - Francisca Reyes-Ramírez
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas/Junta de Andalucía. Departamento de Biología Molecular e Ingeniería Bioquímica, Seville, Spain.
| |
Collapse
|
14
|
Payá-Milans M, Olmstead JW, Nunez G, Rinehart TA, Staton M. Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species. Gigascience 2018; 7:5168871. [PMID: 30418578 PMCID: PMC6275443 DOI: 10.1093/gigascience/giy132] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/21/2018] [Indexed: 11/12/2022] Open
Abstract
Background The usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis. Findings At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP. Conclusions The selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
Collapse
Affiliation(s)
- Miriam Payá-Milans
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| | - James W Olmstead
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Gerardo Nunez
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Timothy A Rinehart
- Thad Cochran Southern Horticultural Laboratory, USDA-Agricultural Research Service, PO Box 287, Poplarville, MS, 39470, United States.,Crop Production and Protection, USDA-Agricultural Research Service, 5601 Sunnyside Ave, Beltsville, MD, 20705, United States
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| |
Collapse
|
15
|
Molecular Genetic Analysis of Human Endometrial Mesenchymal Stem Cells That Survived Sublethal Heat Shock. Stem Cells Int 2017; 2017:2362630. [PMID: 29375621 PMCID: PMC5742502 DOI: 10.1155/2017/2362630] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 07/13/2017] [Indexed: 02/07/2023] Open
Abstract
High temperature is a critical environmental and personal factor. Although heat shock is a well-studied biological phenomenon, hyperthermia response of stem cells is poorly understood. Previously, we demonstrated that sublethal heat shock induced premature senescence in human endometrial mesenchymal stem cells (eMSC). This study aimed to investigate the fate of eMSC-survived sublethal heat shock (SHS) with special emphasis on their genetic stability and possible malignant transformation using methods of classic and molecular karyotyping, next-generation sequencing, and transcriptome functional analysis. G-banding revealed random chromosome breakages and aneuploidy in the SHS-treated eMSC. Molecular karyotyping found no genomic imbalance in these cells. Gene module and protein interaction network analysis of mRNA sequencing data showed that compared to untreated cells, SHS-survived progeny revealed some difference in gene expression. However, no hallmarks of cancer were found. Our data identified downregulation of oncogenic signaling, upregulation of tumor-suppressing and prosenescence signaling, induction of mismatch, and excision DNA repair. The common feature of heated eMSC is the silence of MYC, AKT1/PKB oncogenes, and hTERT telomerase. Overall, our data indicate that despite genetic instability, SHS-survived eMSC do not undergo transformation. After long-term cultivation, these cells like their unheated counterparts enter replicative senescence and die.
Collapse
|
16
|
Haley ST, Alexander H, Juhl AR, Dyhrman ST. Transcriptional response of the harmful raphidophyte Heterosigma akashiwo to nitrate and phosphate stress. HARMFUL ALGAE 2017; 68:258-270. [PMID: 28962986 DOI: 10.1016/j.hal.2017.07.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 06/30/2017] [Accepted: 07/01/2017] [Indexed: 06/07/2023]
Abstract
The marine eukaryotic alga Heterosigma akashiwo (Raphidophyceae) is known for forming ichthyotoxic harmful algal blooms (HABs). In the past 50 years, H. akashiwo blooms have increased, occurring globally in highly eutrophic coastal and estuarine systems. These systems often incur dramatic physicochemical changes, including macronutrient (nitrogen and phosphorus) enrichment and depletion, on short timescales. Here, H. akashiwo cultures grown under nutrient replete, low N and low P growth conditions were examined for changes in biochemical and physiological characteristics in concert with transcriptome sequencing to provide a mechanistic perspective on the metabolic processes involved in responding to N and P stress. There was a marked difference in the overall transcriptional pattern between low N and low P transcriptomes. Both nutrient stresses led to significant changes in the abundance of thousands of contigs related to a wide diversity of metabolic pathways, with limited overlap between the transcriptomic responses to low N and low P. Enriched contigs under low N included many related to nitrogen metabolism, acquisition, and transport. In addition, metabolic modules like photosynthesis and carbohydrate metabolism changed significantly under low N, coincident with treatment-specific changes in photosynthetic efficiency and particulate carbohydrate content. P-specific contigs responsible for P transport and organic P use were more enriched in the low P treatment than in the replete control and low N treatment. These results provide new insight into the genetic mechanisms that distinguish how this HAB species responds to these two common nutrient stresses, and the results can inform future field studies, linking transcriptional patterns to the physiological ecology of H. akashiwo in situ.
Collapse
Affiliation(s)
- Sheean T Haley
- Columbia University, Lamont-Doherty Earth Observatory, Palisades, NY, USA
| | - Harriet Alexander
- Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA, USA
| | - Andrew R Juhl
- Columbia University, Lamont-Doherty Earth Observatory, Palisades, NY, USA; Columbia University, Department of Earth and Environmental Sciences, Palisades, NY, USA
| | - Sonya T Dyhrman
- Columbia University, Lamont-Doherty Earth Observatory, Palisades, NY, USA; Columbia University, Department of Earth and Environmental Sciences, Palisades, NY, USA.
| |
Collapse
|
17
|
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. Genomics 2017; 109:186-191. [PMID: 28286147 DOI: 10.1016/j.ygeno.2017.03.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 03/07/2017] [Accepted: 03/08/2017] [Indexed: 01/18/2023]
Abstract
Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36bp, 50bp, 72bp, 100bp, 125bp, 150bp, 200bp, 250bp and 300bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (>100bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36bp, 50bp, 72bp) and long reads (>100bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping.
Collapse
|
18
|
Prediction of Poly(A) Sites by Poly(A) Read Mapping. PLoS One 2017; 12:e0170914. [PMID: 28135292 PMCID: PMC5279776 DOI: 10.1371/journal.pone.0170914] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 01/12/2017] [Indexed: 11/19/2022] Open
Abstract
RNA-seq reads containing part of the poly(A) tail of transcripts (denoted as poly(A) reads) provide the most direct evidence for the position of poly(A) sites in the genome. However, due to reduced coverage of poly(A) tails by reads, poly(A) reads are not routinely identified during RNA-seq mapping. Nevertheless, recent studies for several herpesviruses successfully employed mapping of poly(A) reads to identify herpesvirus poly(A) sites using different strategies and customized programs. To more easily allow such analyses without requiring additional programs, we integrated poly(A) read mapping and prediction of poly(A) sites into our RNA-seq mapping program ContextMap 2. The implemented approach essentially generalizes previously used poly(A) read mapping approaches and combines them with the context-based approach of ContextMap 2 to take into account information provided by other reads aligned to the same location. Poly(A) read mapping using ContextMap 2 was evaluated on real-life data from the ENCODE project and compared against a competing approach based on transcriptome assembly (KLEAT). This showed high positive predictive value for our approach, evidenced also by the presence of poly(A) signals, and considerably lower runtime than KLEAT. Although sensitivity is low for both methods, we show that this is in part due to a high extent of spurious results in the gold standard set derived from RNA-PET data. Sensitivity improves for poly(A) sites of known transcripts or determined with a more specific poly(A) sequencing protocol and increases with read coverage on transcript ends. Finally, we illustrate the usefulness of the approach in a high read coverage scenario by a re-analysis of published data for herpes simplex virus 1. Thus, with current trends towards increasing sequencing depth and read length, poly(A) read mapping will prove to be increasingly useful and can now be performed automatically during RNA-seq mapping with ContextMap 2.
Collapse
|
19
|
Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 2016; 14:135-139. [PMID: 27941783 DOI: 10.1038/nmeth.4106] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 11/15/2016] [Indexed: 01/27/2023]
Abstract
Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings.
Collapse
|
20
|
From next-generation resequencing reads to a high-quality variant data set. Heredity (Edinb) 2016; 118:111-124. [PMID: 27759079 DOI: 10.1038/hdy.2016.102] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 09/03/2016] [Accepted: 09/06/2016] [Indexed: 12/11/2022] Open
Abstract
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
Collapse
|
21
|
Greenwood JM, Ezquerra AL, Behrens S, Branca A, Mallet L. Current analysis of host–parasite interactions with a focus on next generation sequencing data. ZOOLOGY 2016; 119:298-306. [DOI: 10.1016/j.zool.2016.06.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 06/22/2016] [Accepted: 06/22/2016] [Indexed: 01/21/2023]
|
22
|
Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA (NEW YORK, N.Y.) 2016; 22:1120-38. [PMID: 27284164 PMCID: PMC4931105 DOI: 10.1261/rna.055509.115] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 05/04/2016] [Indexed: 05/26/2023]
Abstract
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing.
Collapse
Affiliation(s)
- Mark Ziemann
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| | - Antony Kaspi
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| | - Assam El-Osta
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| |
Collapse
|
23
|
Juan-Mateu J, Villate O, Eizirik DL. MECHANISMS IN ENDOCRINOLOGY: Alternative splicing: the new frontier in diabetes research. Eur J Endocrinol 2016; 174:R225-38. [PMID: 26628584 PMCID: PMC5331159 DOI: 10.1530/eje-15-0916] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 12/01/2015] [Indexed: 12/30/2022]
Abstract
Type 1 diabetes (T1D) is a chronic autoimmune disease in which pancreatic β cells are killed by infiltrating immune cells and by cytokines released by these cells. This takes place in the context of a dysregulated dialogue between invading immune cells and target β cells, but the intracellular signals that decide β cell fate remain to be clarified. Alternative splicing (AS) is a complex post-transcriptional regulatory mechanism affecting gene expression. It regulates the inclusion/exclusion of exons into mature mRNAs, allowing individual genes to produce multiple protein isoforms that expand the proteome diversity. Functionally related transcript populations are co-ordinately spliced by master splicing factors, defining regulatory networks that allow cells to rapidly adapt their transcriptome in response to intra and extracellular cues. There is a growing interest in the role of AS in autoimmune diseases, but little is known regarding its role in T1D. In this review, we discuss recent findings suggesting that splicing events occurring in both immune and pancreatic β cells contribute to the pathogenesis of T1D. Splicing switches in T cells and in lymph node stromal cells are involved in the modulation of the immune response against β cells, while β cells exposed to pro-inflammatory cytokines activate complex splicing networks that modulate β cell viability, expression of neoantigens and susceptibility to immune-induced stress. Unveiling the role of AS in β cell functional loss and death will increase our understanding of T1D pathogenesis and may open new avenues for disease prevention and therapy.
Collapse
Affiliation(s)
- Jonàs Juan-Mateu
- Medical FacultyULB Center for Diabetes Research and Welbio, Université Libre de Bruxelles (ULB), Route de Lennik, 808 - CP618, B-1070 Brussels, Belgium
| | - Olatz Villate
- Medical FacultyULB Center for Diabetes Research and Welbio, Université Libre de Bruxelles (ULB), Route de Lennik, 808 - CP618, B-1070 Brussels, Belgium
| | - Décio L Eizirik
- Medical FacultyULB Center for Diabetes Research and Welbio, Université Libre de Bruxelles (ULB), Route de Lennik, 808 - CP618, B-1070 Brussels, Belgium
| |
Collapse
|
24
|
Abstract
The zebrafish has emerged as an important model for studying cancer biology. Identification of DNA, RNA and chromatin abnormalities can give profound insight into the mechanisms of tumorigenesis and the there are many techniques for analyzing the genomes of these tumors. Here, I present an overview of the available technologies for analyzing tumor genomes in the zebrafish, including array based methods as well as next-generation sequencing technologies. I also discuss the ways in which zebrafish tumor genomes can be compared to human genomes using cross-species oncogenomics, which act to filter genomic noise and ultimately uncover central drivers of malignancy. Finally, I discuss downstream analytic tools, including network analysis, that can help to organize the alterations into coherent biological frameworks that can then be investigated further.
Collapse
Affiliation(s)
- Richard M White
- Memorial Sloan Kettering Cancer Center, 415 East 68th Street, New York, NY, 10065, USA.
| |
Collapse
|
25
|
Rubio M, Ballester AR, Olivares PM, Castro de Moura M, Dicenta F, Martínez-Gómez P. Gene Expression Analysis of Plum pox virus (Sharka) Susceptibility/Resistance in Apricot (Prunus armeniaca L.). PLoS One 2015; 10:e0144670. [PMID: 26658051 PMCID: PMC4684361 DOI: 10.1371/journal.pone.0144670] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 11/20/2015] [Indexed: 11/18/2022] Open
Abstract
RNA-Seq has proven to be a very powerful tool in the analysis of the Plum pox virus (PPV, sharka disease)/Prunus interaction. This technique is an important complementary tool to other means of studying genomics. In this work an analysis of gene expression of resistance/susceptibility to PPV in apricot is performed. RNA-Seq has been applied to analyse the gene expression changes induced by PPV infection in leaves from two full-sib apricot genotypes, “Rojo Pasión” and “Z506-7”, resistant and susceptible to PPV, respectively. Transcriptomic analyses revealed the existence of more than 2,000 genes related to the pathogen response and resistance to PPV in apricot. These results showed that the response to infection by the virus in the susceptible genotype is associated with an induction of genes involved in pathogen resistance such as the allene oxide synthase, S-adenosylmethionine synthetase 2 and the major MLP-like protein 423. Over-expression of the Dicer protein 2a may indicate the suppression of a gene silencing mechanism of the plant by PPV HCPro and P1 PPV proteins. On the other hand, there were 164 genes involved in resistance mechanisms that have been identified in apricot, 49 of which are located in the PPVres region (scaffold 1 positions from 8,050,804 to 8,244,925), which is responsible for PPV resistance in apricot. Among these genes in apricot there are several MATH domain-containing genes, although other genes inside (Pleiotropic drug resistance 9 gene) or outside (CAP, Cysteine-rich secretory proteins, Antigen 5 and Pathogenesis-related 1 protein; and LEA, Late embryogenesis abundant protein) PPVres region could also be involved in the resistance.
Collapse
Affiliation(s)
- Manuel Rubio
- Department of Plant Breeding, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), PO Box 164, E-30100 Espinardo (Murcia) Spain
| | - Ana Rosa Ballester
- Department of Food Science, Instituto de Agroquímica y Tecnología de Alimentos (IATA-CSIC), Avda. Agustín Escardino 7, 46980 Paterna (Valencia) Spain
| | - Pedro Manuel Olivares
- Department of Plant Breeding, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), PO Box 164, E-30100 Espinardo (Murcia) Spain
| | - Manuel Castro de Moura
- aScidea Computational Biology Solutions, S.L. Parc de Reserca UAB, Edifici Eureka. 08193 Bellaterra (Cerdanyola del Vallés), Barcelona, Spain
| | - Federico Dicenta
- Department of Plant Breeding, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), PO Box 164, E-30100 Espinardo (Murcia) Spain
| | - Pedro Martínez-Gómez
- Department of Plant Breeding, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), PO Box 164, E-30100 Espinardo (Murcia) Spain
- * E-mail:
| |
Collapse
|
26
|
Hirsch CD, Springer NM, Hirsch CN. Genomic limitations to RNA sequencing expression profiling. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015; 84:491-503. [PMID: 26331235 DOI: 10.1111/tpj.13014] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 08/25/2015] [Indexed: 05/24/2023]
Abstract
The field of genomics has grown rapidly with the advent of massively parallel sequencing technologies, allowing for novel biological insights with regards to genomic, transcriptomic, and epigenomic variation. One widely utilized application of high-throughput sequencing is transcriptional profiling using RNA sequencing (RNAseq). Understanding the limitations of a technology is critical for accurate biological interpretations, and clear interpretation of RNAseq data can be difficult in species with complex genomes. To understand the limitations of accurate profiling of expression levels we simulated RNAseq reads from annotated gene models in several plant species including Arabidopsis, brachypodium, maize, potato, rice, soybean, and tomato. The simulated reads were aligned using various parameters such as unique versus multiple read alignments. This allowed the identification of genes recalcitrant to RNAseq analyses by having over- and/or under-estimated expression levels. In maize, over 25% of genes deviated by more than 20% from the expected count values, suggesting the need for cautious interpretation of RNAseq data for certain genes. The reasons identified for deviation from expected expression varied between species due to differences in genome structure including, but not limited to, genes encoding short transcripts, overlapping gene models, and gene family size. Utilizing existing empirical datasets we demonstrate the potential for biological misinterpretation resulting from inclusion of 'flagged genes' in analyses. While RNAseq is a powerful tool for understanding biology, there are limitations to this technology that need to be understood in order to improve our biological interpretations.
Collapse
Affiliation(s)
- Cory D Hirsch
- Department of Plant Biology, University of Minnesota, St Paul, MN, 55108, USA
| | - Nathan M Springer
- Department of Plant Biology, University of Minnesota, St Paul, MN, 55108, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN, 55108, USA
| |
Collapse
|
27
|
Giannopoulou EG, Elemento O, Ivashkiv LB. Use of RNA sequencing to evaluate rheumatic disease patients. Arthritis Res Ther 2015; 17:167. [PMID: 26126608 PMCID: PMC4488125 DOI: 10.1186/s13075-015-0677-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Studying the factors that control gene expression is of substantial importance for rheumatic diseases with poorly understood etiopathogenesis. In the past, gene expression microarrays have been used to measure transcript abundance on a genome-wide scale in a particular cell, tissue or organ. Microarray analysis has led to gene signatures that differentiate rheumatic diseases, and stages of a disease, as well as response to treatments. Nowadays, however, with the advent of next-generation sequencing methods, massive parallel sequencing of RNA tends to be the technology of choice for gene expression profiling, due to several advantages over microarrays, as well as for the detection of non-coding transcripts and alternative splicing events. In this review, we describe how RNA sequencing enables unbiased interrogation of the abundance and complexity of the transcriptome, and present a typical experimental workflow and bioinformatics tools that are often used for RNA sequencing analysis. We also discuss different uses of this next-generation sequencing technology to evaluate rheumatic disease patients and investigate the pathogenesis of rheumatic diseases such as rheumatoid arthritis, systemic lupus erythematosus, juvenile idiopathic arthritis and Sjögren’s syndrome.
Collapse
Affiliation(s)
- Eugenia G Giannopoulou
- Biological Sciences Department, New York City College of Technology, City University of New York, New York, NY, 11201, USA. .,Arthritis and Tissue Degeneration Program and the David Z Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA.
| | - Olivier Elemento
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine and Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10021, USA.
| | - Lionel B Ivashkiv
- Arthritis and Tissue Degeneration Program and the David Z Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA.
| |
Collapse
|
28
|
Adams DJ, Doran AG, Lilue J, Keane TM. The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes. Mamm Genome 2015; 26:403-12. [PMID: 26123534 DOI: 10.1007/s00335-015-9579-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/11/2015] [Indexed: 12/16/2022]
Abstract
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Collapse
Affiliation(s)
- David J Adams
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
| | - Anthony G Doran
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
| | - Jingtao Lilue
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
| | - Thomas M Keane
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
| |
Collapse
|
29
|
Rutkowski AJ, Erhard F, L'Hernault A, Bonfert T, Schilhabel M, Crump C, Rosenstiel P, Efstathiou S, Zimmer R, Friedel CC, Dölken L. Widespread disruption of host transcription termination in HSV-1 infection. Nat Commun 2015; 6:7126. [PMID: 25989971 PMCID: PMC4441252 DOI: 10.1038/ncomms8126] [Citation(s) in RCA: 198] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2015] [Accepted: 04/07/2015] [Indexed: 02/07/2023] Open
Abstract
Herpes simplex virus 1 (HSV-1) is an important human pathogen and a paradigm for virus-induced host shut-off. Here we show that global changes in transcription and RNA processing and their impact on translation can be analysed in a single experimental setting by applying 4sU-tagging of newly transcribed RNA and ribosome profiling to lytic HSV-1 infection. Unexpectedly, we find that HSV-1 triggers the disruption of transcription termination of cellular, but not viral, genes. This results in extensive transcription for tens of thousands of nucleotides beyond poly(A) sites and into downstream genes, leading to novel intergenic splicing between exons of neighbouring cellular genes. As a consequence, hundreds of cellular genes seem to be transcriptionally induced but are not translated. In contrast to previous reports, we show that HSV-1 does not inhibit co-transcriptional splicing. Our approach thus substantially advances our understanding of HSV-1 biology and establishes HSV-1 as a model system for studying transcription termination. Herpes simplex virus 1 (HSV-1) efficiently shuts down host gene expression in infected cells. Here Rutkowski et al. analyse the genome-wide changes in transcription and translation in infected cells, and show that HSV-1 triggers an extensive disruption of transcription termination of cellular genes.
Collapse
Affiliation(s)
- Andrzej J Rutkowski
- Division of Infectious Diseases, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Florian Erhard
- Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany
| | - Anne L'Hernault
- Division of Infectious Diseases, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Thomas Bonfert
- Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany
| | - Markus Schilhabel
- Institut für Klinische Molekularbiologie, Christian-Albrechts-Universität Kiel, Schittenhelmstraße 12, 24105 Kiel, Germany
| | - Colin Crump
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Philip Rosenstiel
- Institut für Klinische Molekularbiologie, Christian-Albrechts-Universität Kiel, Schittenhelmstraße 12, 24105 Kiel, Germany
| | - Stacey Efstathiou
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Ralf Zimmer
- Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany
| | - Caroline C Friedel
- Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany
| | - Lars Dölken
- 1] Division of Infectious Diseases, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK [2] Institut für Virologie, Julius-Maximilians-Universität Würzburg, Versbacher Straße 7, 97078 Würzburg, Germany
| |
Collapse
|
30
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
31
|
Abstract
Transposable elements (TEs) are an important factor shaping eukaryotic genomes. Although a significant body of research has been conducted on the abundance of TEs in nuclear genomes, TEs in mitochondrial genomes remain elusive. In this study, we successfully assembled 28 complete yeast mitochondrial genomes and took advantage of the power of population genomics to determine mobile DNAs and their propensity. We have observed compelling evidence of GC clusters propagating within the mitochondrial genome and being horizontally transferred between species. These mitochondrial TEs experience rapid diversification by nucleotide substitution and, more importantly, undergo dynamic merger and shuffling to form new TEs. Given the hyper mobile and transformable nature of mitochondrial TEs, our findings open the door to a deeper understanding of eukaryotic mitochondrial genome evolution and the origin of nonautonomous TEs.
Collapse
|
32
|
Bonfert T, Kirner E, Csaba G, Zimmer R, Friedel CC. ContextMap 2: fast and accurate context-based RNA-seq mapping. BMC Bioinformatics 2015; 16:122. [PMID: 25928589 PMCID: PMC4411664 DOI: 10.1186/s12859-015-0557-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/30/2015] [Indexed: 01/24/2023] Open
Abstract
Background Mapping of short sequencing reads is a crucial step in the analysis of RNA sequencing (RNA-seq) data. ContextMap is an RNA-seq mapping algorithm that uses a context-based approach to identify the best alignment for each read and allows parallel mapping against several reference genomes. Results In this article, we present ContextMap 2, a new and improved version of ContextMap. Its key novel features are: (i) a plug-in structure that allows easily integrating novel short read alignment programs with improved accuracy and runtime; (ii) context-based identification of insertions and deletions (indels); (iii) mapping of reads spanning an arbitrary number of exons and indels. ContextMap 2 using Bowtie, Bowtie 2 or BWA was evaluated on both simulated and real-life data from the recently published RGASP study. Conclusions We show that ContextMap 2 generally combines similar or higher recall compared to other state-of-the-art approaches with significantly higher precision in read placement and junction and indel prediction. Furthermore, runtime was significantly lower than for the best competing approaches. ContextMap 2 is freely available at http://www.bio.ifi.lmu.de/ContextMap. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0557-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Evelyn Kirner
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Gergely Csaba
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Ralf Zimmer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Caroline C Friedel
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| |
Collapse
|
33
|
Farkas MH, Au ED, Sousa ME, Pierce EA. RNA-Seq: Improving Our Understanding of Retinal Biology and Disease. Cold Spring Harb Perspect Med 2015; 5:a017152. [PMID: 25722474 PMCID: PMC4561396 DOI: 10.1101/cshperspect.a017152] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Over the past several years, rapid technological advances have allowed for a dramatic increase in our knowledge and understanding of the transcriptional landscape, because of the ability to study gene expression in greater depth and with more detail than previously possible. To this end, RNA-Seq has quickly become one of the most widely used methods for studying transcriptomes of tissues and individual cells. Unlike previously favored analysis methods, RNA-Seq is extremely high-throughput, and is not dependent on an annotated transcriptome, laying the foundation for novel genetic discovery. Additionally, RNA-Seq derived transcriptomes provide a basis for widening the scope of research to identify potential targets in the treatment of retinal disease.
Collapse
Affiliation(s)
- Michael H Farkas
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, Massachusetts 02114
| | - Elizabeth D Au
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, Massachusetts 02114
| | - Maria E Sousa
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, Massachusetts 02114
| | - Eric A Pierce
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Harvard Medical School, Boston, Massachusetts 02114
| |
Collapse
|
34
|
Rubio M, Rodríguez-Moreno L, Ballester AR, de Moura MC, Bonghi C, Candresse T, Martínez-Gómez P. Analysis of gene expression changes in peach leaves in response to Plum pox virus infection using RNA-Seq. MOLECULAR PLANT PATHOLOGY 2015; 16:164-76. [PMID: 24989162 PMCID: PMC6638525 DOI: 10.1111/mpp.12169] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Differences in gene expression were studied after Plum pox virus (PPV, sharka disease) infection in peach GF305 leaves with and without sharka symptoms using RNA-Seq. For each sample, more than 80% of 100-nucleotide paired-end (PE) Illumina reads were aligned on the peach reference genome. In the symptomatic sample, a significant proportion of reads were mapped to PPV reference genomes (1.04% compared with 0.00002% in non-symptomatic leaves), allowing for the ultra-deep assembly of the complete genome of the PPV isolate used (9775 nucleotides, missing only 11 nucleotides at the 5' genome end). In addition, significant alternative splicing events were detected in 359 genes and 12 990 single nucleotide polymorphisms (SNPs) were identified, 425 of which could be annotated. Gene ontology annotation revealed that the high-ranking mRNA target genes associated with the expression of sharka symptoms are mainly related to the response to biotic stimuli, to lipid and carbohydrate metabolism and to the negative regulation of catalytic activity. A greater number of differentially expressed genes were observed in the early asymptomatic phase of PPV infection in comparison with the symptomatic phase. These early infection events were associated with the induction of genes related to pathogen resistance, such as jasmonic acid, chitinases, cytokinin glucosyl transferases and Lys-M proteins. Once the virus had accumulated, the overexpression of Dicer protein 2a genes suggested a gene silencing plant response that was suppressed by the virus HCPro and P1 proteins. These results illustrate the dynamic nature of the peach-PPV interaction at the transcriptome level and confirm that sharka symptom expression is a complex process that can be understood on the basis of changes in plant gene expression.
Collapse
Affiliation(s)
- Manuel Rubio
- Department of Plant Breeding, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), PO Box 164, E-30100, Espinardo-Murcia, Spain
| | | | | | | | | | | | | |
Collapse
|
35
|
Sequence alignment tools: one parallel pattern to rule them all? BIOMED RESEARCH INTERNATIONAL 2014; 2014:539410. [PMID: 25147803 PMCID: PMC4131566 DOI: 10.1155/2014/539410] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 06/03/2014] [Accepted: 06/21/2014] [Indexed: 11/17/2022]
Abstract
In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.
Collapse
|
36
|
Frischkorn KR, Harke MJ, Gobler CJ, Dyhrman ST. De novo assembly of Aureococcus anophagefferens transcriptomes reveals diverse responses to the low nutrient and low light conditions present during blooms. Front Microbiol 2014; 5:375. [PMID: 25104951 PMCID: PMC4109616 DOI: 10.3389/fmicb.2014.00375] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 07/03/2014] [Indexed: 12/23/2022] Open
Abstract
Transcriptome profiling was performed on the harmful algal bloom-forming pelagophyte Aureococcus anophagefferens strain CCMP 1850 to assess responses to common stressors for dense phytoplankton blooms: low inorganic nitrogen concentrations, low inorganic phosphorus concentrations, low light levels, and a replete control. The de novo assemblies of pooled reads from all treatments reconstructed ~54,000 transcripts using Trinity, and ~31,000 transcripts using ABySS. Comparison to the strain CCMP 1984 genome showed that the majority of the gene models were present in both de novo assemblies and that roughly 95% of contigs from both assemblies mapped to the genome, with Trinity capturing slightly more genome content. Sequence reads were mapped back to the de novo assemblies as well as the gene models and differential expression was analyzed using a Bayesian approach called Analysis of Sequence Counts (ASC). On average, 93% of significantly upregulated transcripts recovered by genome mapping were present in the significantly upregulated pool from both de novo assembly methods. Transcripts related to the transport and metabolism of nitrogen were upregulated in the low nitrogen treatment, transcripts encoding enzymes that hydrolyze organic phosphorus or relieve arsenic toxicity were upregulated in the low phosphorus treatment, and transcripts for enzymes that catabolize organic compounds, restructure lipid membranes, or are involved in sulfolipid biosynthesis were upregulated in the low light treatment. A comparison of this transcriptome to the nutrient regulated transcriptional response of CCMP 1984 identified conserved responses between these two strains. These analyses reveal the transcriptional underpinnings of physiological shifts that could contribute to the ecological success of this species in situ: organic matter processing, metal detoxification, lipid restructuring, and photosynthetic apparatus turnover.
Collapse
Affiliation(s)
- Kyle R Frischkorn
- Department of Earth and Environmental Sciences and the Lamont-Doherty Earth Observatory, Columbia University Palisades, NY, USA
| | - Matthew J Harke
- School of Marine and Atmospheric Sciences, Stony Brook University Southampton, NY, USA
| | - Christopher J Gobler
- School of Marine and Atmospheric Sciences, Stony Brook University Southampton, NY, USA
| | - Sonya T Dyhrman
- Department of Earth and Environmental Sciences and the Lamont-Doherty Earth Observatory, Columbia University Palisades, NY, USA
| |
Collapse
|
37
|
Maji RK, Sarkar A, Khatua S, Dasgupta S, Ghosh Z. PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics 2014; 15:167. [PMID: 24894600 PMCID: PMC4063226 DOI: 10.1186/1471-2105-15-167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Accepted: 05/07/2014] [Indexed: 12/05/2022] Open
Abstract
Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
Collapse
Affiliation(s)
| | | | | | | | - Zhumur Ghosh
- Bioinformatics Centre, Bose Institute, Kolkata 700054, India.
| |
Collapse
|
38
|
Angelini C, De Canditiis D, De Feis I. Computational approaches for isoform detection and estimation: good and bad news. BMC Bioinformatics 2014; 15:135. [PMID: 24885830 PMCID: PMC4098781 DOI: 10.1186/1471-2105-15-135] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 04/24/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments. RESULTS We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. CONCLUSIONS Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance.
Collapse
|
39
|
Cao MD, Balasubramanian S, Boden M. Sequencing technologies and tools for short tandem repeat variation detection. Brief Bioinform 2014; 16:193-204. [DOI: 10.1093/bib/bbu001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
40
|
Pesson M, Eymin B, De La Grange P, Simon B, Corcos L. A dedicated microarray for in-depth analysis of pre-mRNA splicing events: application to the study of genes involved in the response to targeted anticancer therapies. Mol Cancer 2014; 13:9. [PMID: 24428911 PMCID: PMC3899606 DOI: 10.1186/1476-4598-13-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 01/09/2014] [Indexed: 12/22/2022] Open
Abstract
Alternative pre-mRNA splicing (AS) widely expands proteome diversity through the combinatorial assembly of exons. The analysis of AS on a large scale, by using splice-sensitive microarrays, is a highly efficient method to detect the majority of known and predicted alternative transcripts for a given gene. The response to targeted anticancer therapies cannot easily be anticipated without prior knowledge of the expression, by the tumor, of target proteins or genes. To analyze, in depth, transcript structure and levels for genes involved in these responses, including AKT1-3, HER1-4, HIF1A, PIK3CA, PIK3R1-2, VEGFA-D and PIR, we engineered a dedicated gene chip with coverage of an average 185 probes per gene and, especially, exon-exon junction probes. As a proof of concept, we demonstrated the ability of such a chip to detect the effects of over-expressed SRSF2 RNA binding protein on the structure and abundance of mRNA products in H358 lung cancer cells conditionally over-expressing SRSF2. Major splicing changes were observed, including in HER1/EGFR pre-mRNA, which were also seen in human lung cancer samples over-expressing the SRSF2 protein. In addition, we showed that variations in HER1/EGFR pre-mRNA splicing triggered by SRSF2 overexpression in H358 cells resulted in a drop in HER1/EGFR protein level, which correlated with increased sensitivity to gefitinib, an EGFR tyrosine kinase inhibitor. We propose, therefore, that this novel tool could be especially relevant for clinical applications, with the aim to predict the response before treatment.
Collapse
Affiliation(s)
| | | | | | | | - Laurent Corcos
- UMR INSERM U1078-UBO, Equipe ECLA, Faculté de Médecine, 22 Avenue Camille Desmoulins, 29200 Brest, France.
| |
Collapse
|
41
|
Zheng CL, Kawane S, Bottomly D, Wilmot B. Analysis considerations for utilizing RNA-Seq to characterize the brain transcriptome. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 116:21-54. [PMID: 25172470 DOI: 10.1016/b978-0-12-801105-8.00002-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
RNA-Seq allows one to examine only gene expression as well as expression of noncoding RNAs, alternative splicing, and allele-specific expression. With this increased sensitivity and dynamic range, there are computational and statistical considerations that need to be contemplated, which are highly dependent on the biological question being asked. We highlight these to provide an overview of their importance and the impact they can have on downstream interpretation of the brain transcriptome.
Collapse
Affiliation(s)
- Christina L Zheng
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Knight Cancer Institute, Oregon Health, Oregon Health and Science University, Portland, Oregon, USA.
| | - Sunita Kawane
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Daniel Bottomly
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Beth Wilmot
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
42
|
Passetti F, Jorge NAN, Durham A. Using bioinformatics tools to study the role of microRNA in cancer. Methods Mol Biol 2014; 1168:99-116. [PMID: 24870133 DOI: 10.1007/978-1-4939-0847-9_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
High-throughput sequencing (HTS) has emerged as a promising method to study gene expression in neoplastic and normal tissues. Using HTS, many research groups have described transcript variants as well as discovering new transcribed loci and noncoding RNAs, including microRNAs. In oncology, expression profiling of microRNAs in matched tumor and normal tissues has been used to detect differential expression of microRNAs in cancer. We present one approach for laboratories with few bioinformatics support to assist in the analysis of microRNA HTS data focused in oncology. This approach can also be adapted to study other systems.
Collapse
Affiliation(s)
- Fabio Passetti
- Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Câncer (INCA), Rio de Janeiro, RJ, Brazil,
| | | | | |
Collapse
|
43
|
González E, Joly S. Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes. BMC Res Notes 2013; 6:503. [PMID: 24298906 PMCID: PMC4222115 DOI: 10.1186/1756-0500-6-503] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 11/20/2013] [Indexed: 12/26/2022] Open
Abstract
Background High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines. Results We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length. Conclusion In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.
Collapse
Affiliation(s)
- Emmanuel González
- Institut de recherche en biologie végétale, Université de Montréal, 4101 Sherbrooke E, Montréal, H1X 2B2, (QC), Canada.
| | | |
Collapse
|
44
|
Li JW, Bolser D, Manske M, Giorgi FM, Vyahhi N, Usadel B, Clavijo BJ, Chan TF, Wong N, Zerbino D, Schneider MV. The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Brief Bioinform 2013; 14:548-55. [PMID: 23793381 PMCID: PMC3771235 DOI: 10.1093/bib/bbt045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
Collapse
Affiliation(s)
- Jing-Woei Li
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Tel.: +852-39431302;
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Spaethling JM, Eberwine JH. Single-cell transcriptomics for drug target discovery. Curr Opin Pharmacol 2013; 13:786-90. [PMID: 23725882 DOI: 10.1016/j.coph.2013.04.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Revised: 04/25/2013] [Accepted: 04/27/2013] [Indexed: 10/26/2022]
Abstract
Single cell sequencing is currently in its relative infancy although an unprecedented amount of information is already being generated. These techniques are providing new insight into intercellular variability as well as identification of previously unrecognized drug targets. As more groups are gaining an interest in this fruitful technique, new sample preparation techniques, sequencing platforms, and bioinformatics tools are being developed which only improve the quantity and quality of data generated in these studies. Great advancements in harvest (in vivo pipette), sample preparation, and sequencing (Illumina HiSeq 2500/MiSeq, Ion Torrent PGM, Pacific Biosciences RS) are allowing for previously untestable questions to be answered and for expanded accessibility of these technologies.
Collapse
Affiliation(s)
- Jennifer M Spaethling
- Department of Pharmacology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|