1
|
Waters MR, Inkman M, Jayachandran K, Kowalchuk RM, Robinson C, Schwarz JK, Swamidass SJ, Griffith OL, Szymanski JJ, Zhang J. GAiN: An integrative tool utilizing generative adversarial neural networks for augmented gene expression analysis. PATTERNS (NEW YORK, N.Y.) 2024; 5:100910. [PMID: 38370125 PMCID: PMC10873154 DOI: 10.1016/j.patter.2023.100910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/23/2023] [Accepted: 12/07/2023] [Indexed: 02/20/2024]
Abstract
Big genomic data and artificial intelligence (AI) are ushering in an era of precision medicine, providing opportunities to study previously under-represented subtypes and rare diseases rather than categorize them as variances. However, clinical researchers face challenges in accessing such novel technologies as well as reliable methods to study small datasets or subcohorts with unique phenotypes. To address this need, we developed an integrative approach, GAiN, to capture patterns of gene expression from small datasets on the basis of an ensemble of generative adversarial networks (GANs) while leveraging big population data. Where conventional biostatistical methods fail, GAiN reliably discovers differentially expressed genes (DEGs) and enriched pathways between two cohorts with limited numbers of samples (n = 10) when benchmarked against a gold standard. GAiN is freely available at GitHub. Thus, GAiN may serve as a crucial tool for gene expression analysis in scenarios with limited samples, as in the context of rare diseases, under-represented populations, or limited investigator resources.
Collapse
Affiliation(s)
- Michael R. Waters
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Matthew Inkman
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Kay Jayachandran
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | - Clifford Robinson
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Julie K. Schwarz
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - S. Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63105, USA
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63105, USA
| | - Obi L. Griffith
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jeffrey J. Szymanski
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jin Zhang
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
- Institute for Informatics (I), Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
Systems Biology Approaches to Decipher the Underlying Molecular Mechanisms of Glioblastoma Multiforme. Int J Mol Sci 2021; 22:ijms222413213. [PMID: 34948010 PMCID: PMC8706582 DOI: 10.3390/ijms222413213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 11/30/2021] [Accepted: 12/04/2021] [Indexed: 11/29/2022] Open
Abstract
Glioblastoma multiforme (GBM) is one of the most malignant central nervous system tumors, showing a poor prognosis and low survival rate. Therefore, deciphering the underlying molecular mechanisms involved in the progression of the GBM and identifying the key driver genes responsible for the disease progression is crucial for discovering potential diagnostic markers and therapeutic targets. In this context, access to various biological data, development of new methodologies, and generation of biological networks for the integration of multi-omics data are necessary for gaining insights into the appearance and progression of GBM. Systems biology approaches have become indispensable in analyzing heterogeneous high-throughput omics data, extracting essential information, and generating new hypotheses from biomedical data. This review provides current knowledge regarding GBM and discusses the multi-omics data and recent systems analysis in GBM to identify key biological functions and genes. This knowledge can be used to develop efficient diagnostic and treatment strategies and can also be used to achieve personalized medicine for GBM.
Collapse
|
3
|
Huang G, Zhang H, Qu Y, Huang K, Gong X, Wei J, Du H. ARMT: An automatic RNA-seq data mining tool based on comprehensive and integrative analysis in cancer research. Comput Struct Biotechnol J 2021; 19:4426-4434. [PMID: 34471489 PMCID: PMC8379379 DOI: 10.1016/j.csbj.2021.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/19/2021] [Accepted: 08/06/2021] [Indexed: 11/02/2022] Open
Abstract
The comprehensive and integrative analysis of RNA-seq data, in different molecular layers from diverse samples, holds promise to address the full-scale complexity of biological systems. Recent advances in gene set variant analysis (GSVA) are providing exciting opportunities for revealing the specific biological processes of cancer samples. However, it is still urgently needed to develop a tool, which combines GSVA and different molecular characteristic analysis, as well as prognostic characteristics of cancer patients to reveal the biological processes of disease comprehensively. Here, we develop ARMT, an automatic tool for RNA-Seq data analysis. ARMT is an efficient and integrative tool with user-friendly interface to analyze related molecular characters of single gene and gene set comprehensively based on transcriptome and genomic data, which builds the bridge for deeper information between genes and pathways, to further accelerate scientific findings. ARMT can be installed easily from https://github.com/Dulab2020/ARMT.
Collapse
Affiliation(s)
- Guanda Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Haibo Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yimo Qu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Kaitang Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Xiaocheng Gong
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jinfen Wei
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
4
|
Stubbs FE, Conway-Campbell BL, Lightman SL. Thirty years of neuroendocrinology: Technological advances pave the way for molecular discovery. J Neuroendocrinol 2019; 31:e12653. [PMID: 30362285 DOI: 10.1111/jne.12653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/16/2018] [Accepted: 10/21/2018] [Indexed: 12/12/2022]
Abstract
Since the 1950s, the systems level interactions between the hypothalamus, pituitary and end organs such as the adrenal, thyroid and gonads have been well known; however, it is only over the last three decades that advances in molecular biology and information technology have provided a tremendous expansion of knowledge at the molecular level. Neuroendocrinology has benefitted from developments in molecular genetics, epigenetics and epigenomics, and most recently optogenetics and pharmacogenetics. This has enabled a new understanding of gene regulation, transcription, translation and post-translational regulation, which should help direct the development of drugs to treat neuroendocrine-related diseases.
Collapse
Affiliation(s)
- Felicity E Stubbs
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| | - Becky L Conway-Campbell
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| | - Stafford L Lightman
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| |
Collapse
|
5
|
Weber L, Maßberg D, Becker C, Altmüller J, Ubrig B, Bonatz G, Wölk G, Philippou S, Tannapfel A, Hatt H, Gisselmann G. Olfactory Receptors as Biomarkers in Human Breast Carcinoma Tissues. Front Oncol 2018; 8:33. [PMID: 29497600 PMCID: PMC5818398 DOI: 10.3389/fonc.2018.00033] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/31/2018] [Indexed: 12/20/2022] Open
Abstract
Olfactory receptors (ORs) are known to be expressed in a variety of human tissues and act on different physiological processes, such as cell migration, proliferation, or secretion and have been found to function as biomarkers for carcinoma tissues of prostate, lung, and small intestine. In this study, we analyzed the OR expression profiles of several different carcinoma tissues, with a focus on breast cancer. The expression of OR2B6 was detectable in breast carcinoma tissues; here, transcripts of OR2B6 were detected in 73% of all breast carcinoma cell lines and in over 80% of all of the breast carcinoma tissues analyzed. Interestingly, there was no expression of OR2B6 observed in healthy tissues. Immunohistochemical staining of OR2B6 in breast carcinoma tissues revealed a distinct staining pattern of carcinoma cells. Furthermore, we detected a fusion transcript containing part of the coding exon of OR2B6 as a part of a splice variant of the histone HIST1H2BO transcript. In addition, in cancer tissues and cell lines derived from lung, pancreas, and brain, OR expression patterns were compared to that of corresponding healthy tissues. The number of ORs detected in lung carcinoma tissues was significantly reduced in comparison to the surrounding healthy tissues. In pancreatic carcinoma tissues, OR4C6 was considerably more highly expressed in comparison to the respective healthy tissues. We detected OR2B6 as a potential biomarker for breast carcinoma tissues.
Collapse
Affiliation(s)
- Lea Weber
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Désirée Maßberg
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Christian Becker
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Janine Altmüller
- Cologne Center for Genomics, University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne, University of Cologne, Cologne, Germany
| | - Burkhard Ubrig
- Clinic for Urology, Augusta-Kranken-Anstalt Bochum, Bochum, Germany
| | - Gabriele Bonatz
- Clinic of Gynaecology, Augusta-Kranken-Anstalt Bochum, Bochum, Germany
| | - Gerhard Wölk
- Clinic of Gynaecology, Herz-Jesu-Krankenhaus Dernbach, Dernbach, Germany
| | - Stathis Philippou
- Department of Pathology and Cytology, Augusta-Kranken-Anstalt Bochum, Bochum, Germany
| | - Andrea Tannapfel
- Institute for Pathology, Ruhr-University Bochum, Bochum, Germany
| | - Hanns Hatt
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Günter Gisselmann
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| |
Collapse
|
6
|
Low-cost, Low-bias and Low-input RNA-seq with High Experimental Verifiability based on Semiconductor Sequencing. Sci Rep 2017; 7:1053. [PMID: 28432352 PMCID: PMC5430657 DOI: 10.1038/s41598-017-01165-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/27/2017] [Indexed: 12/13/2022] Open
Abstract
Low-input RNA-seq is powerful to represent the gene expression profiles with limited number of cells, especially when single-cell variations are not the aim. However, pre-amplification-based and molecule index-based library construction methods boost bias or require higher throughput. Here we demonstrate a simple, low-cost, low-bias and low-input RNA-seq with ion torrent semiconductor sequencing (LIEA RNA-seq). We also developed highly accurate and error-tolerant spliced mapping algorithm FANSe2splice to accurately map the single-ended reads to the reference genome with better experimental verifiability than the previous spliced mappers. Combining the experimental and computational advancements, our solution is comparable with the bulk mRNA-seq in quantification, reliably detects splice junctions and minimizes the bias with much less mappable reads.
Collapse
|
7
|
Gan RC, Chen TW, Wu TH, Huang PJ, Lee CC, Yeh YM, Chiu CH, Huang HD, Tang P. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms. BMC Bioinformatics 2016; 17:513. [PMID: 28155708 PMCID: PMC5260104 DOI: 10.1186/s12859-016-1366-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Results Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. Conclusions In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw.
Collapse
Affiliation(s)
- Ruei-Chi Gan
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.,Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Timothy H Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan
| | - Po-Jung Huang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Yuan-Ming Yeh
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Cheng-Hsun Chiu
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan. .,Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.
| | - Petrus Tang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan. .,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan. .,Molecular Regulation & Bioinformatics Laboratory, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
8
|
Strong MJ, Blanchard E, Lin Z, Morris CA, Baddoo M, Taylor CM, Ware ML, Flemington EK. A comprehensive next generation sequencing-based virome assessment in brain tissue suggests no major virus - tumor association. Acta Neuropathol Commun 2016; 4:71. [PMID: 27402152 PMCID: PMC4940872 DOI: 10.1186/s40478-016-0338-z] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 06/15/2016] [Indexed: 12/15/2022] Open
Abstract
Next generation sequencing (NGS) can globally interrogate the genetic composition of biological samples in an unbiased yet sensitive manner. The objective of this study was to utilize the capabilities of NGS to investigate the reported association between glioblastoma multiforme (GBM) and human cytomegalovirus (HCMV). A large-scale comprehensive virome assessment was performed on publicly available sequencing datasets from the Cancer Genome Atlas (TCGA), including RNA-seq datasets from primary GBM (n = 157), recurrent GBM (n = 13), low-grade gliomas (n = 514), recurrent low-grade gliomas (n = 17), and normal brain (n = 5), and whole genome sequencing (WGS) datasets from primary GBM (n = 51), recurrent GBM (n = 10), and normal matched blood samples (n = 20). In addition, RNA-seq datasets from MRI-guided biopsies (n = 92) and glioma stem-like cell cultures (n = 9) were analyzed. Sixty-four DNA-seq datasets from 11 meningiomas and their corresponding blood control samples were also analyzed. Finally, three primary GBM tissue samples were obtained, sequenced using RNA-seq, and analyzed. After in-depth analysis, the most robust virus findings were the detection of papillomavirus (HPV) and hepatitis B reads in the occasional LGG sample (4 samples and 1 sample, respectively). In addition, low numbers of virus reads were detected in several datasets but detailed investigation of these reads suggest that these findings likely represent artifacts or non-pathological infections. For example, all of the sporadic low level HCMV reads were found to map to the immediate early promoter intimating that they likely originated from laboratory expression vector contamination. Despite the detection of low numbers of Epstein-Barr virus reads in some samples, these likely originated from infiltrating B-cells. Finally, human herpesvirus 6 and 7 aligned viral reads were identified in all DNA-seq and a few RNA-seq datasets but detailed analysis demonstrated that these were likely derived from the homologous human telomeric-like repeats. Other low abundance viral reads were detected in some samples but for most viruses, the reads likely represent artifacts or incidental infections. This analysis argues against associations between most known viruses and GBM or mengiomas. Nevertheless, there may be a low percentage association between HPV and/or hepatitis B and LGGs.
Collapse
|
9
|
Wang Q, An Y, Yuan Q, Qi Y, Ou Y, Chen J, Huang J. Identification of allelic expression imbalance genes in human hepatocellular carcinoma through massively parallel DNA and RNA sequencing. Med Oncol 2016; 33:38. [PMID: 27000824 DOI: 10.1007/s12032-016-0751-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2016] [Accepted: 02/27/2016] [Indexed: 12/15/2022]
Abstract
Hepatocellular carcinoma (HCC) is a common malignant tumor worldwide. The prognosis and treatment of this disease have changed little in recent decades because the mechanisms underlying most events of this disease remain obscure. Allelic variation of gene expression is associated with many important biological processes, which provide a new perspective to understand HCC pathogenesis at the molecular level. To identify allelic expression imbalance (AEI) genes in HCCs, we developed a computational method that considered accurate mapping and vigorous AEI detection using paired DNA-seq and RNA-seq data. We analyzed the DNA-seq and RNA-seq data derived from two HCC samples and two cell lines. By applying a strict criterion, a total of 203 tumor-specific AEI genes were identified with high confidence, and several genes have been reported to be associated with the migration or proliferation of cancer cells, such as the genes RELN and DHRS3. In addition, we also found some novel AEI genes in HCCs, such as HNRNPR and PTAFR. Our study provides new insight into AEI events that may contribute to understanding gene expression regulation, cell proliferation and migration, and tumorigenesis.
Collapse
Affiliation(s)
- Qiudao Wang
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, No. 800, Dongchuan Road, Minhang District, Shanghai, 200240, China.,National Engineering Center for Biochip at Shanghai, Shanghai, 201203, China
| | - Yan An
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, No. 800, Dongchuan Road, Minhang District, Shanghai, 200240, China
| | - Qing Yuan
- National Engineering Center for Biochip at Shanghai, Shanghai, 201203, China
| | - Yao Qi
- National Engineering Center for Biochip at Shanghai, Shanghai, 201203, China
| | - Ying Ou
- National Engineering Center for Biochip at Shanghai, Shanghai, 201203, China
| | - Junhui Chen
- Peking University Shenzhen Hospital, Shenzhen, 518036, China
| | - Jian Huang
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, No. 800, Dongchuan Road, Minhang District, Shanghai, 200240, China. .,National Engineering Center for Biochip at Shanghai, Shanghai, 201203, China. .,Shenzhen Key Laboratory of Infection and Immunity, Shenzhen Third People's Hospital, Guangdong Medical College, Shenzhen, 518112, China. .,Shanghai-MOST Key Laboratory for Disease and Health Genomics, Chinese National Human Genome Center, Shanghai, 201203, China.
| |
Collapse
|
10
|
Wang Y, Yi L, Wang S, Lu C, Ding C. Selective capture of transcribed sequences in the functional gene analysis of microbial pathogens. Appl Microbiol Biotechnol 2014; 98:9983-92. [PMID: 25381492 DOI: 10.1007/s00253-014-6190-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 10/23/2014] [Accepted: 10/25/2014] [Indexed: 01/26/2023]
Abstract
Selective capture of transcribed sequences (SCOTS) is an effective method to identify bacterial genes differentially expressed during different biological processes, including pathogenic interactions with a host species. The method can be used to elucidate molecular mechanisms driving and maintaining such interactions. The method is a powerful genetic tool that overcomes limitations found in other methods, by working with small amounts of mRNA and allowing for the separation of bacterial cDNA from host cDNA. It has been increasingly used in the discovery of genes involved in the bacterium-host interaction. In this review, we briefly introduce the SCOTS method, outline the technical advances offered in the method, and focus on the method's applications in several microbial pathogens.
Collapse
Affiliation(s)
- Yang Wang
- College of Animal Science and Technology, Henan University of Science and Technology, Luoyang, China,
| | | | | | | | | |
Collapse
|
11
|
Exon expression QTL (eeQTL) analysis highlights distant genomic variations associated with splicing regulation. QUANTITATIVE BIOLOGY 2014. [DOI: 10.1007/s40484-014-0031-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
RNA-Seq identifies key reproductive gene expression alterations in response to cadmium exposure. BIOMED RESEARCH INTERNATIONAL 2014; 2014:529271. [PMID: 24982889 PMCID: PMC4058285 DOI: 10.1155/2014/529271] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Revised: 05/07/2014] [Accepted: 05/07/2014] [Indexed: 12/14/2022]
Abstract
Cadmium is a common toxicant that is detrimental to many tissues. Although a number of transcriptional signatures have been revealed in different tissues after cadmium treatment, the genes involved in the cadmium caused male reproductive toxicity, and the underlying molecular mechanism remains unclear. Here we observed that the mice treated with different amount of cadmium in their rodent chow for six months exhibited reduced serum testosterone. We then performed RNA-seq to comprehensively investigate the mice testicular transcriptome to further elucidate the mechanism. Our results showed that hundreds of genes expression altered significantly in response to cadmium treatment. In particular, we found several transcriptional signatures closely related to the biological processes of regulation of hormone, gamete generation, and sexual reproduction, respectively. The expression of several testosterone synthetic key enzyme genes, such as Star, Cyp11a1, and Cyp17a1, were inhibited by the cadmium exposure. For better understanding of the cadmium-mediated transcriptional regulatory mechanism of the genes, we computationally analyzed the transcription factors binding sites and the mircoRNAs targets of the differentially expressed genes. Our findings suggest that the reproductive toxicity by cadmium exposure is implicated in multiple layers of deregulation of several biological processes and transcriptional regulation in mice.
Collapse
|
13
|
Alamancos GP, Agirre E, Eyras E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol Biol 2014; 1126:357-97. [PMID: 24549677 DOI: 10.1007/978-1-62703-980-2_26] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data, which could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.
Collapse
Affiliation(s)
- Gael P Alamancos
- Computational Genomics, Universitat Pompeu Fabra, Barcelona, Spain
| | | | | |
Collapse
|
14
|
Piskol R, Ramaswami G, Li J. Reliable identification of genomic variants from RNA-seq data. Am J Hum Genet 2013; 93:641-51. [PMID: 24075185 DOI: 10.1016/j.ajhg.2013.08.008] [Citation(s) in RCA: 237] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 07/25/2013] [Accepted: 08/09/2013] [Indexed: 12/20/2022] Open
Abstract
Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome. Here, we present a highly accurate approach termed SNPiR to identify SNPs in RNA-seq data. We applied SNPiR to RNA-seq data of samples for which WGS and WES data are also available and achieved high specificity and sensitivity. Of the SNPs called from the RNA-seq data, >98% were also identified by WGS or WES. Over 70% of all expressed coding variants were identified from RNA-seq, and comparable numbers of exonic variants were identified in RNA-seq and WES. Despite our method's limitation in detecting variants in expressed regions only, our results demonstrate that SNPiR outperforms current state-of-the-art approaches for variant detection from RNA-seq data and offers a cost-effective and reliable alternative for SNP discovery.
Collapse
|
15
|
Bonfert T, Csaba G, Zimmer R, Friedel CC. Mining RNA-seq data for infections and contaminations. PLoS One 2013; 8:e73071. [PMID: 24019895 PMCID: PMC3760913 DOI: 10.1371/journal.pone.0073071] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 07/16/2013] [Indexed: 02/06/2023] Open
Abstract
RNA sequencing (RNA-seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Gergely Csaba
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Ralf Zimmer
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
| | - Caroline C. Friedel
- Institute for Informatics, Ludwig–Maximilians–Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
16
|
Bianchi V, Colantoni A, Calderone A, Ausiello G, Ferrè F, Helmer-Citterich M. DBATE: database of alternative transcripts expression. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat050. [PMID: 23842462 PMCID: PMC5654372 DOI: 10.1093/database/bat050] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. Database URL:http://bioinformatica.uniroma2.it/DBATE/.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica s.n.c., Rome 00133, Italy
| | | | | | | | | | | |
Collapse
|
17
|
Lindner R, Friedel CC. A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 2012; 7:e52403. [PMID: 23300661 PMCID: PMC3530550 DOI: 10.1371/journal.pone.0052403] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Accepted: 11/16/2012] [Indexed: 11/25/2022] Open
Abstract
Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.
Collapse
Affiliation(s)
- Robert Lindner
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, Heidelberg, Germany
| | - Caroline C. Friedel
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
18
|
Abstract
UNLABELLED Accurately mapping RNA-Seq reads to the reference genome is a critical step for performing downstream analysis such as transcript assembly, isoform detection and quantification. Many tools have been developed; however, given the huge size of the next generation sequencing datasets and the complexity of the transcriptome, RNA-Seq read mapping remains a challenge with the ever-increasing amount of data. We develop Omicsoft sequence aligner (OSA), a fast and accurate alignment tool for RNA-Seq data. Benchmarked with existing methods, OSA improves mapping speed 4-10-fold with better sensitivity and less false positives. AVAILABILITY OSA can be downloaded from http://omicsoft.com/osa. It is free to academic users. OSA has been tested extensively on Linux, Mac OS X and Windows platforms.
Collapse
Affiliation(s)
- Jun Hu
- Division of Bioinformatics, Omicsoft Inc., 164 Quade Drive, Cary, NC 27513, USA.
| | | | | | | |
Collapse
|
19
|
Bonfert T, Csaba G, Zimmer R, Friedel CC. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinformatics 2012; 13 Suppl 6:S9. [PMID: 22537048 PMCID: PMC3358662 DOI: 10.1186/1471-2105-13-s6-s9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Sequencing of mRNA (RNA-seq) by next generation sequencing technologies is widely used for analyzing the transcriptomic state of a cell. Here, one of the main challenges is the mapping of a sequenced read to its transcriptomic origin. As a simple alignment to the genome will fail to identify reads crossing splice junctions and a transcriptome alignment will miss novel splice sites, several approaches have been developed for this purpose. Most of these approaches have two drawbacks. First, each read is assigned to a location independent on whether the corresponding gene is expressed or not, i.e. information from other reads is not taken into account. Second, in case of multiple possible mappings, the mapping with the fewest mismatches is usually chosen which may lead to wrong assignments due to sequencing errors. Results To address these problems, we developed ContextMap which efficiently uses information on the context of a read, i.e. reads mapping to the same expressed region. The context information is used to resolve possible ambiguities and, thus, a much larger degree of ambiguities can be allowed in the initial stage in order to detect all possible candidate positions. Although ContextMap can be used as a stand-alone version using either a genome or transcriptome as input, the version presented in this article is focused on refining initial mappings provided by other mapping algorithms. Evaluation results on simulated sequencing reads showed that the application of ContextMap to either TopHat or MapSplice mappings improved the mapping accuracy of both initial mappings considerably. Conclusions In this article, we show that the context of reads mapping to nearby locations provides valuable information for identifying the best unique mapping for a read. Using our method, mappings provided by other state-of-the-art methods can be refined and alignment accuracy can be further improved. Availability http://www.bio.ifi.lmu.de/ContextMap.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig-Maximilians-University Munich, Amalienstr, 17, 80333 Munich, Germany
| | | | | | | |
Collapse
|