1
|
Rosati D, Palmieri M, Brunelli G, Morrione A, Iannelli F, Frullanti E, Giordano A. Differential gene expression analysis pipelines and bioinformatic tools for the identification of specific biomarkers: A review. Comput Struct Biotechnol J 2024; 23:1154-1168. [PMID: 38510977 PMCID: PMC10951429 DOI: 10.1016/j.csbj.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/20/2024] [Accepted: 02/20/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, the role of bioinformatics and computational biology together with omics techniques and transcriptomics has gained tremendous importance in biomedicine and healthcare, particularly for the identification of biomarkers for precision medicine and drug discovery. Differential gene expression (DGE) analysis is one of the most used techniques for RNA-sequencing (RNA-seq) data analysis. This tool, which is typically used in various RNA-seq data processing applications, allows the identification of differentially expressed genes across two or more sample sets. Functional enrichment analyses can then be performed to annotate and contextualize the resulting gene lists. These studies provide valuable information about disease-causing biological processes and can help in identifying molecular targets for novel therapies. This review focuses on differential gene expression (DGE) analysis pipelines and bioinformatic techniques commonly used to identify specific biomarkers and discuss the advantages and disadvantages of these techniques.
Collapse
Affiliation(s)
- Diletta Rosati
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Maria Palmieri
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Giulia Brunelli
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Andrea Morrione
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Francesco Iannelli
- Laboratory of Molecular Microbiology and Biotechnology, Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Elisa Frullanti
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Antonio Giordano
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Bandopadhyay L, Basu D, Ranjan Sikdar S. De novo transcriptome assembly and global analysis of differential gene expression of aphid tolerant wild mustard Rorippa indica (L.) Hiern infested by mustard aphid Lipaphis Erysimi (L.) Kaltenbach. Funct Integr Genomics 2024; 24:43. [PMID: 38418630 DOI: 10.1007/s10142-024-01323-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 03/02/2024]
Abstract
Rapeseed-mustard, the oleiferous Brassica species are important oilseed crops cultivated all over the globe. Mustard aphid Lipaphis erysimi (L.) Kaltenbach is a major threat to the cultivation of rapeseed-mustard. Wild mustard Rorippa indica (L.) Hiern shows tolerance to mustard aphids as a nonhost and hence is an important source for the bioprospecting of potential resistance genes and defense measures to manage mustard aphids sustainably. We performed mRNA sequencing of the R. indica plant uninfested and infested by the mustard aphids, harvested at 24 hours post-infestation. Following quality control, the high-quality reads were subjected to de novo assembly of the transcriptome. As there is no genomic information available for this potential wild plant, the raw reads will be useful for further bioinformatics analysis and the sequence information of the assembled transcripts will be helpful to design primers for the characterization of specific gene sequences. In this study, we also used the generated resource to comprehensively analyse the global profile of differential gene expression in R. indica in response to infestation by mustard aphids. The functional enrichment analysis of the differentially expressed genes reveals a significant immune response and suggests the possibility of chitin-induced defense signaling.
Collapse
Affiliation(s)
- Lekha Bandopadhyay
- Division of Plant Biology, Bose Institute, P 1/12, C. I. T. Road, Scheme VIIM, Kolkata, 700054, India.
| | - Debabrata Basu
- Division of Plant Biology, Bose Institute, P 1/12, C. I. T. Road, Scheme VIIM, Kolkata, 700054, India
| | - Samir Ranjan Sikdar
- Division of Plant Biology, Bose Institute, P 1/12, C. I. T. Road, Scheme VIIM, Kolkata, 700054, India.
| |
Collapse
|
3
|
Sorokin M, Buzdin AA, Guryanova A, Efimov V, Suntsova MV, Zolotovskaia MA, Koroleva EV, Sekacheva MI, Tkachev VS, Garazha A, Kremenchutckaya K, Drobyshev A, Seryakov A, Gudkov A, Alekseenko IV, Rakitina O, Kostina MB, Vladimirova U, Moisseev A, Bulgin D, Radomskaya E, Shestakov V, Baklaushev VP, Prassolov V, Shegay PV, Li X, Poddubskaya EV, Gaifullin N. Large-scale assessment of pros and cons of autopsy-derived or tumor-matched tissues as the norms for gene expression analysis in cancers. Comput Struct Biotechnol J 2023; 21:3964-3986. [PMID: 37635765 PMCID: PMC10448432 DOI: 10.1016/j.csbj.2023.07.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 07/17/2023] [Accepted: 07/30/2023] [Indexed: 08/29/2023] Open
Abstract
Normal tissues are essential for studying disease-specific differential gene expression. However, healthy human controls are typically available only in postmortal/autopsy settings. In cancer research, fragments of pathologically normal tissue adjacent to tumor site are frequently used as the controls. However, it is largely underexplored how cancers can systematically influence gene expression of the neighboring tissues. Here we performed a comprehensive pan-cancer comparison of molecular profiles of solid tumor-adjacent and autopsy-derived "healthy" normal tissues. We found a number of systemic molecular differences related to activation of the immune cells, intracellular transport and autophagy, cellular respiration, telomerase activation, p38 signaling, cytoskeleton remodeling, and reorganization of the extracellular matrix. The tumor-adjacent tissues were deficient in apoptotic signaling and negative regulation of cell growth including G2/M cell cycle transition checkpoint. We also detected an extensive rearrangement of the chemical perception network. Molecular targets of 32 and 37 cancer drugs were over- or underexpressed, respectively, in the tumor-adjacent norms. These processes may be driven by molecular events that are correlated between the paired cancer and adjacent normal tissues, that mostly relate to inflammation and regulation of intracellular molecular pathways such as the p38, MAPK, Notch, and IGF1 signaling. However, using a model of macaque postmortal tissues we showed that for the 30 min - 24-hour time frame at 4ºC, an RNA degradation pattern in lung biosamples resulted in an artifact "differential" expression profile for 1140 genes, although no differences could be detected in liver. Thus, such concerns should be addressed in practice.
Collapse
Affiliation(s)
- Maksim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Omicsway Corp., Walnut, CA 91789, USA
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Anton A. Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
- World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| | - Anastasia Guryanova
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
| | - Victor Efimov
- World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia
| | - Maria V. Suntsova
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Marianna A. Zolotovskaia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Omicsway Corp., Walnut, CA 91789, USA
| | - Elena V. Koroleva
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
| | - Marina I. Sekacheva
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Victor S. Tkachev
- Omicsway Corp., Walnut, CA 91789, USA
- Oncobox Ltd., Moscow 121205, Russia
| | - Andrew Garazha
- Omicsway Corp., Walnut, CA 91789, USA
- Oncobox Ltd., Moscow 121205, Russia
| | | | - Aleksey Drobyshev
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | | | - Alexander Gudkov
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Irina V. Alekseenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
- Institute of Molecular Genetics of National Research Centre "Kurchatov Institute", 2, Kurchatov Square, Moscow 123182, Russian
- FSBI "National Medical Research Center for Obstetrics, Gynecology and Perinatology named after Academician V.I. Kulakov" Ministry of Healthcare of the Russian Federation, Moscow 117198, Russia
| | - Olga Rakitina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Maria B. Kostina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Uliana Vladimirova
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
- Oncobox Ltd., Moscow 121205, Russia
| | - Aleksey Moisseev
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Dmitry Bulgin
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | - Elena Radomskaya
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | - Viktor Shestakov
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | | | - Vladimir Prassolov
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32 Vavilova str., Moscow 119991, Russia
| | - Petr V. Shegay
- National Medical Research Radiological Center of the Ministry of Health of the Russian Federation, 249036 Obninsk, Russia
| | - Xinmin Li
- UCLA Technology Center for Genomics & Bioinformatics, Department of Pathology & Laboratory Medicine, 650 Charles E Young Dr., Los Angeles, CA 90095, USA
| | | | - Nurshat Gaifullin
- Department of Physiology and General Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow 119991, Russia
| |
Collapse
|
4
|
Brozzetti L, Scambi I, Bertoldi L, Zanini A, Malacrida G, Sacchetto L, Baldassa L, Benvenuto G, Mariotti R, Zanusso G, Cecchini MP. RNAseq analysis of olfactory neuroepithelium cytological samples in individuals with Down syndrome compared to euploid controls: a pilot study. Neurol Sci 2023; 44:919-930. [PMID: 36394661 PMCID: PMC9925603 DOI: 10.1007/s10072-022-06500-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 11/05/2022] [Indexed: 11/18/2022]
Abstract
Down syndrome is a common genetic disorder caused by partial or complete triplication of chromosome 21. This syndrome shows an overall and progressive impairment of olfactory function, detected early in adulthood. The olfactory neuronal cells are located in the nasal olfactory mucosa and represent the first sensory neurons of the olfactory pathway. Herein, we applied the olfactory swabbing procedure to allow a gentle collection of olfactory epithelial cells in seven individuals with Down syndrome and in ten euploid controls. The aim of this research was to investigate the peripheral gene expression pattern in olfactory epithelial cells through RNAseq analysis. Validated tests (Sniffin' Sticks Extended test) were used to assess olfactory function. Olfactory scores were correlated with RNAseq results and cognitive scores (Vineland II and Leiter scales). All Down syndrome individuals showed both olfactory deficit and intellectual disability. Down syndrome individuals and euploid controls exhibited clear expression differences in genes located in and outside the chromosome 21. In addition, a significant correlation was found between olfactory test scores and gene expression, while a non-significant correlation emerged between olfactory and cognitive scores. This first preliminary step gives new insights into the Down syndrome olfactory system research, starting from the olfactory neuroepithelium, the first cellular step on the olfactory way.
Collapse
Affiliation(s)
- Lorenzo Brozzetti
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Unit, University of Verona, Verona, Italy
| | - Ilaria Scambi
- Department of Neurosciences, Biomedicine and Movement Sciences, Anatomy and Histology Section, University of Verona, Strada Le Grazie 8, 37134, Verona, Italy
| | | | - Alice Zanini
- Department of Neurosciences, Biomedicine and Movement Sciences, Anatomy and Histology Section, University of Verona, Strada Le Grazie 8, 37134, Verona, Italy
| | | | - Luca Sacchetto
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, Otolaryngology Section, University of Verona, Verona, Italy
| | - Lucia Baldassa
- AGBD, Associazione Sindrome di Down, Onlus, Verona, Italy
| | | | - Raffaella Mariotti
- Department of Neurosciences, Biomedicine and Movement Sciences, Anatomy and Histology Section, University of Verona, Strada Le Grazie 8, 37134, Verona, Italy
| | - Gianluigi Zanusso
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Unit, University of Verona, Verona, Italy
| | - Maria Paola Cecchini
- Department of Neurosciences, Biomedicine and Movement Sciences, Anatomy and Histology Section, University of Verona, Strada Le Grazie 8, 37134, Verona, Italy.
| |
Collapse
|
5
|
Holton KM, Giadone RM, Lang BJ, Calderwood SK. A Workflow Guide to RNA-Seq Analysis of Chaperone Function and Beyond. Methods Mol Biol 2023; 2693:39-60. [PMID: 37540425 DOI: 10.1007/978-1-0716-3342-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
RNA sequencing (RNA-seq) is a powerful method of transcriptional analysis that allows for the sequence identification and quantification of cellular transcripts. RNA-seq can be used for differential gene expression (DGE) analysis, gene fusion detection, allele-specific expression, isoform and splice variant quantification, and identification of novel genes. These applications can be used for downstream systems biology analyses such as gene ontology or pathway analysis to provide insight into processes altered between biological conditions. Given the wide range of signaling pathways subject to chaperone activity as well as numerous chaperone functions in RNA metabolism, RNA-seq may provide a valuable tool for the study of chaperone proteins in biology and disease. This chapter outlines an example RNA-seq workflow to determine differentially expressed (DE) genes between two or more sample conditions and provides some considerations for RNA-seq experimental design.
Collapse
Affiliation(s)
- Kristina M Holton
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
- Harvard Stem Cell Institute, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Richard M Giadone
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin J Lang
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Stuart K Calderwood
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| |
Collapse
|
6
|
Özbek M, Toy HI, Oktay Y, Karakülah G, Suner A, Pavlopoulou A. An in silico approach to the identification of diagnostic and prognostic markers in low-grade gliomas. PeerJ 2023; 11:e15096. [PMID: 36945359 PMCID: PMC10024901 DOI: 10.7717/peerj.15096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 02/28/2023] [Indexed: 03/18/2023] Open
Abstract
Low-grade gliomas (LGG) are central nervous system Grade I tumors, and as they progress they are becoming one of the deadliest brain tumors. There is still great need for timely and accurate diagnosis and prognosis of LGG. Herein, we aimed to identify diagnostic and prognostic biomarkers associated with LGG, by employing diverse computational approaches. For this purpose, differential gene expression analysis on high-throughput transcriptomics data of LGG versus corresponding healthy brain tissue, derived from TCGA and GTEx, respectively, was performed. Weighted gene co-expression network analysis of the detected differentially expressed genes was carried out in order to identify modules of co-expressed genes significantly correlated with LGG clinical traits. The genes comprising these modules were further used to construct gene co-expression and protein-protein interaction networks. Based on the network analyses, we derived a consensus of eighteen hub genes, namely, CD74, CD86, CDC25A, CYBB, HLA-DMA, ITGB2, KIF11, KIFC1, LAPTM5, LMNB1, MKI67, NCKAP1L, NUSAP1, SLC7A7, TBXAS1, TOP2A, TYROBP, and WDFY4. All detected hub genes were up-regulated in LGG, and were also associated with unfavorable prognosis in LGG patients. The findings of this study could be applicable in the clinical setting for diagnosing and monitoring LGG.
Collapse
Affiliation(s)
- Melih Özbek
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | - Halil Ibrahim Toy
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States
| | - Yavuz Oktay
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Faculty of Medicine, Department of Medical Biology, Dokuz Eylül University, Izmir, Turkey
| | - Gökhan Karakülah
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | - Aslı Suner
- Faculty of Medicine, Department of Biostatistics and Medical Informatics, Izmir, Turkey
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| |
Collapse
|
7
|
Li X, Qi J, Song X, Xu X, Pan T, Wang H, Yang J, Han Y. DLC1 deficiency at diagnosis predicts poor prognosis in acute myeloid leukemia. Exp Hematol Oncol 2022; 11:74. [PMID: 36258263 DOI: 10.1186/s40164-022-00335-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/08/2022] [Indexed: 11/10/2022] Open
Abstract
Acute myeloid leukemia (AML) is a complex, heterogeneous malignant hematologic disease. Although multiple prognostic-related genes gave been explored in previous studies, there are still many genes whose prognostic value remains unclear. In this study, a total of 1532 AML patients from three GEO databases were included, five genes with potential prognostic value (DLC1, NF1B, DENND5B, TANC2 and ELAVL4) were screened by weighted gene co-expression network analysis (WGCNA), least absolute shrinkage and selection operator (LASSO) and support vector machine recursive feature elimination (SVM-RFE). Based on this, we conducted survival analysis of the above five genes through the TCGA database and found that low level of DLC1 was detrimental to the long-term prognosis of AML patients. We also performed external validation in 48 AML patients from our medical center to analyze the impact of DLC1 level on prognosis. In conclusion, DLC1 may be a potential marker affecting the prognosis of AML, and its deficiency is associated with poor prognosis.
Collapse
|
8
|
Malm M, Kuo CC, Barzadd MM, Mebrahtu A, Wistbacka N, Razavi R, Volk AL, Lundqvist M, Kotol D, Tegel H, Hober S, Edfors F, Gräslund T, Chotteau V, Field R, Varley PG, Roth RG, Lewis NE, Hatton D, Rockberg J. Harnessing secretory pathway differences between HEK293 and CHO to rescue production of difficult to express proteins. Metab Eng 2022; 72:171-187. [PMID: 35301123 PMCID: PMC9189052 DOI: 10.1016/j.ymben.2022.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/09/2022] [Accepted: 03/10/2022] [Indexed: 10/31/2022]
Abstract
Biologics represent the fastest growing group of therapeutics, but many advanced recombinant protein moieties remain difficult to produce. Here, we identify metabolic engineering targets limiting expression of recombinant human proteins through a systems biology analysis of the transcriptomes of CHO and HEK293 during recombinant expression. In an expression comparison of 24 difficult to express proteins, one third of the challenging human proteins displayed improved secretion upon host cell swapping from CHO to HEK293. Guided by a comprehensive transcriptomics comparison between cell lines, especially highlighting differences in secretory pathway utilization, a co-expression screening of 21 secretory pathway components validated ATF4, SRP9, JUN, PDIA3 and HSPA8 as productivity boosters in CHO. Moreover, more heavily glycosylated products benefitted more from the elevated activities of the N- and O-glycosyltransferases found in HEK293. Collectively, our results demonstrate the utilization of HEK293 for expression rescue of human proteins and suggest a methodology for identification of secretory pathway components for metabolic engineering of HEK293 and CHO.
Collapse
Affiliation(s)
- Magdalena Malm
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Chih-Chung Kuo
- Departments of Pediatrics and Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA; The Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, CA, 92093, USA
| | - Mona Moradi Barzadd
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Aman Mebrahtu
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Num Wistbacka
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Ronia Razavi
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Anna-Luisa Volk
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Magnus Lundqvist
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - David Kotol
- Science for Life Laboratory, KTH - Royal Institute of Technology, Solna, 171 65, Sweden
| | - Hanna Tegel
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Sophia Hober
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Fredrik Edfors
- Science for Life Laboratory, KTH - Royal Institute of Technology, Solna, 171 65, Sweden
| | - Torbjörn Gräslund
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden
| | - Veronique Chotteau
- Dept. of Industrial Biotechnology, KTH - Royal Institute of Technology, Stockholm, SE-10691, Sweden
| | - Ray Field
- Cell Culture and Fermentation Sciences, BioPharmaceutical Development, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Paul G Varley
- Cell Culture and Fermentation Sciences, BioPharmaceutical Development, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Robert G Roth
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Nathan E Lewis
- Departments of Pediatrics and Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA; The Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, CA, 92093, USA.
| | - Diane Hatton
- Cell Culture and Fermentation Sciences, BioPharmaceutical Development, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Johan Rockberg
- Dept. of Protein Science, KTH - Royal Institute of Technology, Stockholm, SE-106 91, Sweden.
| |
Collapse
|
9
|
Shrestha AMS, B Guiao JE, R Santiago KC. Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment. BMC Genomics 2022; 23:97. [PMID: 35120462 PMCID: PMC8815227 DOI: 10.1186/s12864-021-08278-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 12/22/2021] [Indexed: 11/16/2022] Open
Abstract
Background RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. Results We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs – the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar. Conclusions The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08278-7).
Collapse
Affiliation(s)
- Anish M S Shrestha
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines. .,Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines.
| | - Joyce Emlyn B Guiao
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.,Department of Mathematics and Statistics, College of Science, De La Salle University, Manila, Philippines
| | - Kyle Christian R Santiago
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.,Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
10
|
Pinel GD, Horder JL, King JR, McIntyre A, Mongan NP, López GG, Benest AV. Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching. Methods Mol Biol 2022; 2441:369-426. [PMID: 35099752 DOI: 10.1007/978-1-0716-2059-5_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
RNA-seq is a common approach used to explore gene expression data between experimental conditions or cell types and ultimately leads to information that can shed light on the biological processes involved and inform further hypotheses. While the protocols required to generate samples for sequencing can be performed in most research facilities, the resulting computational analysis is often an area in which researchers have little experience. Here we present a user-friendly bioinformatics workflow which describes the methods required to take raw data produced by RNA sequencing to interpretable results. Widely used and well documented tools are applied. Data quality assessment and read trimming were performed by FastQC and Cutadapt, respectively. Following this, STAR was utilized to map the trimmed reads to a reference genome and the alignment was analyzed by Qualimap. The subsequent mapped reads were quantified by featureCounts. DESeq2 was used to normalize and perform differential expression analysis on the quantified reads, identifying differentially expressed genes and preparing the data for functional enrichment analysis. Gene set enrichment analysis identified enriched gene sets from the normalized count data and clusterProfiler was used to perform functional enrichment against the GO, KEGG, and Reactome databases. Example figures of the functional enrichment analysis results were also generated. The example data used in the workflow are derived from HUVECs, an in vitro model used in the study of endothelial cells, published and publicly available for download from the European Nucleotide Archive.
Collapse
Affiliation(s)
- Guillermo Díez Pinel
- Neuronal and Vascular Biology Group, UCL Institute of Ophthalmology, University College London, London, UK
| | - Joseph L Horder
- Endothelial Quiescence Group, Centre for Cancer Sciences, Biodiscovery Institute, School of Medicine, University of Nottingham, Nottingham, UK
| | - John R King
- School of Mathematics, Faculty of Science, University of Nottingham, Nottingham, UK
| | - Alan McIntyre
- Hypoxia and Acidosis Group, Center for Cancer Sciences, Biodiscovery Institute, University of Nottingham, Nottingham, UK
| | - Nigel P Mongan
- School of Veterinary Medicine and Science, Biodiscovery Institute, University of Nottingham, Nottingham, UK
- Department of Pharmacology, Weill Cornell Medicine, New York, NY, USA
| | - Gonzalo Gómez López
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Andrew V Benest
- Endothelial Quiescence Group, Centre for Cancer Sciences, Biodiscovery Institute, School of Medicine, University of Nottingham, Nottingham, UK.
| |
Collapse
|
11
|
Li W, Ding Z, Wang D, Li C, Pan Y, Zhao Y, Zhao H, Lu T, Xu R, Zhang S, Yuan B, Zhao Y, Yin Y, Gao Y, Li J, Yan M. Ten-gene signature reveals the significance of clinical prognosis and immuno-correlation of osteosarcoma and study on novel skeleton inhibitors regarding MMP9. Cancer Cell Int 2021; 21:377. [PMID: 34261456 PMCID: PMC8281696 DOI: 10.1186/s12935-021-02041-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/24/2021] [Indexed: 02/08/2023] Open
Abstract
OBJECTIVES This study aimed to identify novel targets in the carcinogenesis, therapy and prognosis of osteosarcoma from genomic level, together with screening ideal lead compounds with potential inhibition regarding MMP-9. METHODS Gene expression profiles from GSE12865, GSE14359, GSE33382, GSE36001 and GSE99671 were obtained respectively from GEO database. Differentially expressed genes were identified, and functional enrichment analysis, such as GO, KEGG, GSEA, PPI were performed to make a comprehensive understanding of the hub genes. Next, a series of high-precision computational techniques were conducted to screen potential lead compounds targeting MMP9, including virtual screening, ADME, toxicity prediction, and accurate docking analysis. RESULTS 10 genes, MMP9, CD74, SPP1, CXCL12, TYROBP, FCER1G, HCLS1, ARHGDIB, LAPTM5 and IGF1R were identified as hub genes in the initiation of osteosarcoma. Machine learning, multivariate Cox analysis, ssGSEA and survival analysis demonstrated that these genes had values in prognosis, immune-correlation and targeted treatment. Tow novel compounds, ZINC000072131515 and ZINC000004228235, were screened as potential inhibitor regarding MMP9, and they could bind to MMP9 with favorable interaction energy and high binding affinity. Meanwhile, they were precited to be efficient and safe drugs with low-ames mutagenicity, none weight evidence of carcinogenicity, as well as non-toxic with liver. CONCLUSIONS This study revealed the significance of 10-gene signature in the development of osteosarcoma. Besides, drug candidates identified in this study provided a solid basis on MMP9 inhibitors' development.
Collapse
Affiliation(s)
- Weihang Li
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China
| | - Ziyi Ding
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China
| | - Dong Wang
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China
| | - Chengfei Li
- School of Aerospace Medicine, Fourth Military Medical University, 169 Chang Le Xi Road, Xi'an, 710032, Shaanxi, China
| | - Yikai Pan
- School of Aerospace Medicine, Fourth Military Medical University, 169 Chang Le Xi Road, Xi'an, 710032, Shaanxi, China
| | - Yingjing Zhao
- Department of Intensive Care Unit, Nanjing First Hospital, Nanjing Medical University, Nanjing, 210006, Jiangsu, China
| | - Hongzhe Zhao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Tianxing Lu
- Hou Zonglian Medical Experimental Class, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Rui Xu
- Department of Endocrinology, Shanghai National Research Center for Endocrine and Metabolic Disease, State Key Laboratory of Medical Genomics, Shanghai Institute for Endocrine and Metabolic Disease, Ruijin Hospital. Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| | - Shilei Zhang
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China
| | - Bin Yuan
- Department of Spine Surgery, Daxing Hospital, Xi'an, Shaanxi, China
| | - Yunlong Zhao
- College of Clinical Medicine, Jilin University, Changchun, China
| | - Yanjiang Yin
- Department of Hepatobiliary Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yuan Gao
- School of Aerospace Medicine, Fourth Military Medical University, 169 Chang Le Xi Road, Xi'an, 710032, Shaanxi, China.
| | - Jing Li
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China.
| | - Ming Yan
- Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi'an, People's Republic of China.
| |
Collapse
|
12
|
Stupnikov A, McInerney CE, Savage KI, McIntosh SA, Emmert-Streib F, Kennedy R, Salto-Tellez M, Prise KM, McArt DG. Robustness of differential gene expression analysis of RNA-seq. Comput Struct Biotechnol J 2021; 19:3470-3481. [PMID: 34188784 PMCID: PMC8214188 DOI: 10.1016/j.csbj.2021.05.040] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 05/25/2021] [Accepted: 05/25/2021] [Indexed: 01/05/2023] Open
Abstract
RNA-sequencing (RNA-seq) is a relatively new technology that lacks standardisation. RNA-seq can be used for Differential Gene Expression (DGE) analysis, however, no consensus exists as to which methodology ensures robust and reproducible results. Indeed, it is broadly acknowledged that DGE methods provide disparate results. Despite obstacles, RNA-seq assays are in advanced development for clinical use but further optimisation will be needed. Herein, five DGE models (DESeq2, voom + limma, edgeR, EBSeq, NOISeq) for gene-level detection were investigated for robustness to sequencing alterations using a controlled analysis of fixed count matrices. Two breast cancer datasets were analysed with full and reduced sample sizes. DGE model robustness was compared between filtering regimes and for different expression levels (high, low) using unbiased metrics. Test sensitivity estimated as relative False Discovery Rate (FDR), concordance between model outputs and comparisons of a ’population’ of slopes of relative FDRs across different library sizes, generated using linear regressions, were examined. Patterns of relative DGE model robustness proved dataset-agnostic and reliable for drawing conclusions when sample sizes were sufficiently large. Overall, the non-parametric method NOISeq was the most robust followed by edgeR, voom, EBSeq and DESeq2. Our rigorous appraisal provides information for method selection for molecular diagnostics. Metrics may prove useful towards improving the standardisation of RNA-seq for precision medicine.
Collapse
Affiliation(s)
- A Stupnikov
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Russian Federation.,Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - C E McInerney
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - K I Savage
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - S A McIntosh
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - F Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
| | - R Kennedy
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - M Salto-Tellez
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - K M Prise
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - D G McArt
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| |
Collapse
|
13
|
Ma Z, Xu J, Ru L, Zhu W. Identification of pivotal genes associated with the prognosis of gastric carcinoma through integrated analysis. Biosci Rep 2021; 41:BSR20203676. [PMID: 33754626 DOI: 10.1042/BSR20203676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 03/18/2021] [Accepted: 03/22/2021] [Indexed: 12/13/2022] Open
Abstract
PURPOSE Detecting and diagnosing gastric cancer (GC) during its early period remains greatly difficult. Our analysis was performed to detect core genes correlated with GC and explore their prognostic values. METHODS Microarray datasets from the Gene Expression Omnibus (GEO) (GSE54129) and The Cancer Genome Atlas (TCGA)-stomach adenocarcinoma (STAD) datasets were applied for common differentially co-expressed genes using differential gene expression analysis and Weighted Gene Co-expression Network Analysis (WGCNA). Functional enrichment analysis and protein-protein interaction (PPI) network analysis of differentially co-expressed genes were performed. We identified hub genes via the CytoHubba plugin. Prognostic values of hub genes were explored. Afterward, Gene Set Enrichment Analysis (GSEA) was used to analyze survival-related hub genes. Finally, the tumor-infiltrating immune cell (TIC) abundance profiles were estimated. RESULTS Sixty common differentially co-expressed genes were found. Functional enrichment analysis implied that cell-cell junction organization and cell adhesion molecules were primarily enriched. Hub genes were identified using the degree, edge percolated component (EPC), maximal clique centrality (MCC), and maximum neighborhood component (MNC) algorithms, and serpin family E member 1 (SERPINE1) was highly associated with the prognosis of GC patients. Moreover, GSEA demonstrated that extracellular matrix (ECM) receptor interactions and pathways in cancers were correlated with SERPINE1 expression. CIBERSORT analysis of the proportion of TICs suggested that CD8+ T cell and T-cell regulation were negatively associated with SERPINE1 expression, showing that SERPINE1 may inhibit the immune-dominant status of the tumor microenvironment (TME) in GC. CONCLUSIONS Our analysis shows that SERPINE1 is closely correlated with the tumorigenesis and progression of GC. Furthermore, SERPINE1 acts as a candidate therapeutic target and prognostic biomarker of GC.
Collapse
|
14
|
Takeuchi F, Kato N. Nonlinear ridge regression improves cell-type-specific differential expression analysis. BMC Bioinformatics 2021; 22:141. [PMID: 33752591 PMCID: PMC7986289 DOI: 10.1186/s12859-021-03982-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 01/27/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity. RESULTS First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data. CONCLUSION Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas.
Collapse
Affiliation(s)
- Fumihiko Takeuchi
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan.
| | - Norihiro Kato
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan
| |
Collapse
|
15
|
Jia R, Weng Y, Li Z, Liang W, Ji Y, Liang Y, Ning P. Bioinformatics Analysis Identifies IL6ST as a Potential Tumor Suppressor Gene for Triple-Negative Breast Cancer. Reprod Sci 2021; 28:2331-2341. [PMID: 33650093 DOI: 10.1007/s43032-021-00509-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/21/2021] [Indexed: 12/11/2022]
Abstract
Improved insight into the molecular mechanisms of triple-negative breast cancer (TNBC) is required to predict prognosis and develop a new therapeutic strategy for targeted genes. The aim of this study was to identify genes significantly associated with TNBC and further analyze their prognostic significance. The Cancer Genome Atlas (TCGA) TNBC database and gene expression profiles of GSE76275 from Gene Expression Omnibus (GEO) were used to explore differentially co-expressed genes in TNBC compared with those in normal tissues and non-TNBC breast cancer tissues. Differential gene expression and weighted gene co-expression network analyses identified 24 differentially co-expressed genes. Functional annotation suggested that these genes were primarily enriched in processes such as metabolism, membrane, and protein binding. The protein-protein interaction (PPI) network further identified ten hub genes, five of which (MAPT, CBS, SOX11, IL6ST, and MEX3A) were confirmed to be differentially expressed in an independent dataset (GSE38959). Moreover, CBS and MEX3A expression was upregulated, whereas IL6ST expression was downregulated in TNBC tissues compared to that in other breast cancer subtypes. Furthermore, lower expression of IL6ST was associated with worse overall survival in patients with TNBC. Thus, IL6ST might play an important role in TNBC progression and could serve as a tumor suppressor gene for diagnosis and treatment.
Collapse
Affiliation(s)
- Rong Jia
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Yujie Weng
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Zhongxian Li
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Wei Liang
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Yucheng Ji
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Ying Liang
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China
| | - Pengfei Ning
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China.
| |
Collapse
|
16
|
Minadakis G, Sokratous K, Spyrou GM. ProtExA: A tool for post-processing proteomics data providing differential expression metrics, co-expression networks and functional analytics. Comput Struct Biotechnol J 2020; 18:1695-1703. [PMID: 32670509 PMCID: PMC7340977 DOI: 10.1016/j.csbj.2020.06.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 06/17/2020] [Accepted: 06/20/2020] [Indexed: 12/31/2022] Open
Abstract
ProTExA is a web-tool that provides a post-processing workflow for the analysis of protein and gene expression datasets. Using network-based bioinformatics approaches, ProTExA facilitates differential expression analysis and co-expression network analysis as well as pathway and post-pathway analysis. Specifically, for a given set of protein-gene expression data across samples, ProTExA: (1) performs statistical analysis and filtering to highlight the differentially expressed proteins-genes, (2) performs enrichment analysis to identify top-scored pathways, (3) generates pathway-to-pathway and pathway-to-gene networks (4) generates protein and gene co-expression networks using a variety of methodologies, and (5) applies clustering methodologies to identify sub-networks of co-expressed proteins-genes. The proposed web-tool is a simple yet informative tool, towards understanding and exploitation of protein and gene expression datasets, especially for those that do not have the expertise and local resources to replicate specific analyses in the context of collaborative and scientific data exchanging.
Collapse
Affiliation(s)
- George Minadakis
- Department of Bioinformatics, The Cyprus Institute of Neurology & Genetics, 6 International Airport Avenue, 2370 Nicosia, P.O. Box 23462, 1683 Nicosia, Cyprus
- The Cyprus School of Molecular Medicine, The Cyprus Institute of Neurology & Genetics, 6 International Airport Avenue, 2370 Nicosia, P.O. Box 23462, 1683 Nicosia, Cyprus
| | - Kleitos Sokratous
- Department of Bioinformatics, The Cyprus Institute of Neurology & Genetics, 6 International Airport Avenue, 2370 Nicosia, P.O. Box 23462, 1683 Nicosia, Cyprus
- OMass Therapeutics, The Schrödinger Building, Heatley Road, The Oxford Science Park, Oxford OX4 4GE, UK
| | - George M Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology & Genetics, 6 International Airport Avenue, 2370 Nicosia, P.O. Box 23462, 1683 Nicosia, Cyprus
- The Cyprus School of Molecular Medicine, The Cyprus Institute of Neurology & Genetics, 6 International Airport Avenue, 2370 Nicosia, P.O. Box 23462, 1683 Nicosia, Cyprus
| |
Collapse
|
17
|
Choi J, Topouza DG, Tarnouskaya A, Nesdoly S, Koti M, Duan QL. Gene networks and expression quantitative trait loci associated with adjuvant chemotherapy response in high-grade serous ovarian cancer. BMC Cancer 2020; 20:413. [PMID: 32404140 PMCID: PMC7218510 DOI: 10.1186/s12885-020-06922-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 04/30/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND A major impediment in the treatment of ovarian cancer is the relapse of chemotherapy-resistant tumors, which occurs in approximately 25% of patients. A better understanding of the biological mechanisms underlying chemotherapy resistance will improve treatment efficacy through genetic testing and novel therapies. METHODS Using data from high-grade serous ovarian carcinoma (HGSOC) patients in the Cancer Genome Atlas (TCGA), we classified those who remained progression-free for 12 months following platinum-taxane combination chemotherapy as "chemo-sensitive" (N = 160) and those who had recurrence within 6 months as "chemo-resistant" (N = 110). Univariate and multivariate analysis of expression microarray data were used to identify differentially expressed genes and co-expression gene networks associated with chemotherapy response. Moreover, we integrated genomics data to determine expression quantitative trait loci (eQTL). RESULTS Differential expression of the Valosin-containing protein (VCP) gene and five co-expression gene networks were significantly associated with chemotherapy response in HGSOC. VCP and the most significant co-expression network module contribute to protein processing in the endoplasmic reticulum, which has been implicated in chemotherapy response. Both univariate and multivariate analysis findings were successfully replicated in an independent ovarian cancer cohort. Furthermore, we identified 192 cis-eQTLs associated with the expression of network genes and 4 cis-eQTLs associated with BRCA2 expression. CONCLUSION This study implicates both known and novel genes as well as biological processes underlying response to platinum-taxane-based chemotherapy among HGSOC patients.
Collapse
Affiliation(s)
- Jihoon Choi
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada
| | - Danai G Topouza
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada
| | | | - Sean Nesdoly
- School of Computing, Queen's University, Kingston, Ontario, Canada
| | - Madhuri Koti
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada
| | - Qing Ling Duan
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada.
- School of Computing, Queen's University, Kingston, Ontario, Canada.
| |
Collapse
|
18
|
Agarwal S, Kashaw SK. Potential target identification for breast cancer and screening of small molecule inhibitors: A bioinformatics approach. J Biomol Struct Dyn 2020; 39:1975-1989. [PMID: 32186248 DOI: 10.1080/07391102.2020.1743757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
In the current study, we investigated the role of PAK1 (P21 (RAC1) Activated Kinase 1) gene in breast cancer and to this end, we performed differential gene expression analysis of PAK1 in breast cancer tissues compared to the normal adjacent tissue. We also studied its significance in protein-protein interaction (PPI) network, and analysed biological pathways, cellular processes, and role of PAK1 in different diseases. We found PAK1 to have significant role in breast cancer pathways such as integrin signaling, axonal guidance signaling, signaling by Rho family GTPases, ERK5 signaling. Additionally, it has been found as hub gene in PPI network, suggesting its possible regulatory role in breast carcinogenesis. Moreover, PAK1 had role in progression of various diseases as neoplasia, tumorigenesis, lymphatic neoplasia. Thereby, PAK1 can be used as a therapeutic target in breast cancer. Further, we put our efforts in identification of potential small molecules inhibitors against PAK1 by developing a composite virtual screening protocol involving molecular dynamics (MD) and molecular docking. The chemical library of compounds from NCI diversity sets, Pubchem and eMolecules were screened against PAK1 protein and hits which showed good binding affinity were considered for MD simulation study. Moreover, to assess binding of selected hits, MMGBSA (Molecular Mechanics-Generalized Born Surface Area) analysis was performed using AMBER (Assisted Model Building with Energy Refinement) package. MMGBSA calculations exhibited that the identified ligands showed good binding affinity with PAK1. HighlightsThe PAK1 has been found to be upregulated in breast cancer samples and is a potential oncogene playing role in different cellular functions and processes.The molecular docking studies revealed ligands showed good binding affinity towards PAK1 protein.The residues Glu345, Leu347, Thr406, Asp299, Asp393 and Gly350 were found to make H-bond interactions with small molecule inhibitors.The residues Ile276, Val284, Ala297, Tyr346, Leu396 and Asp407 were found to make hydrophobic interactions.The RMSD analysis confirmed stability of complexes throughout 40 ns production period.The MD simulations studies revealed the binding site flexibility, binding free energy of complexes and per-residue contribution in ligand binding.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shivangi Agarwal
- Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, MP, India
| | - Sushil K Kashaw
- Department of Pharmaceutical Sciences, Dr. Harisingh Gour University (A Central University), Sagar, MP, India
| |
Collapse
|
19
|
Walker LA, Sovic MG, Chiang CL, Hu E, Denninger JK, Chen X, Kirby ED, Byrd JC, Muthusamy N, Bundschuh R, Yan P. CLEAR: coverage-based limiting-cell experiment analysis for RNA-seq. J Transl Med 2020; 18:63. [PMID: 32039730 PMCID: PMC7008572 DOI: 10.1186/s12967-020-02247-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 01/28/2020] [Indexed: 01/07/2023] Open
Abstract
Background Direct cDNA preamplification protocols developed for single-cell RNA-seq have enabled transcriptome profiling of precious clinical samples and rare cell populations without the need for sample pooling or RNA extraction. We term the use of single-cell chemistries for sequencing low numbers of cells limiting-cell RNA-seq (lcRNA-seq). Currently, there is no customized algorithm to select robust/low-noise transcripts from lcRNA-seq data for between-group comparisons. Methods Herein, we present CLEAR, a workflow that identifies reliably quantifiable transcripts in lcRNA-seq data for differentially expressed genes (DEG) analysis. Total RNA obtained from primary chronic lymphocytic leukemia (CLL) CD5+ and CD5− cells were used to develop the CLEAR algorithm. Once established, the performance of CLEAR was evaluated with FACS-sorted cells enriched from mouse Dentate Gyrus (DG). Results When using CLEAR transcripts vs. using all transcripts in CLL samples, downstream analyses revealed a higher proportion of shared transcripts across three input amounts and improved principal component analysis (PCA) separation of the two cell types. In mouse DG samples, CLEAR identifies noisy transcripts and their removal improves PCA separation of the anticipated cell populations. In addition, CLEAR was applied to two publicly-available datasets to demonstrate its utility in lcRNA-seq data from other institutions. If imputation is applied to limit the effect of missing data points, CLEAR can also be used in large clinical trials and in single cell studies. Conclusions lcRNA-seq coupled with CLEAR is widely used in our institution for profiling immune cells (circulating or tissue-infiltrating) for its transcript preservation characteristics. CLEAR fills an important niche in pre-processing lcRNA-seq data to facilitate transcriptome profiling and DEG analysis. We demonstrate the utility of CLEAR in analyzing rare cell populations in clinical samples and in murine neural DG region without sample pooling.
Collapse
Affiliation(s)
- Logan A Walker
- Department of Physics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Michael G Sovic
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Chi-Ling Chiang
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Eileen Hu
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Jiyeon K Denninger
- Department of Psychology, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA
| | - Xi Chen
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Elizabeth D Kirby
- Department of Psychology, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA.,Chronic Brain Injury Program, The Ohio State University, Columbus, OH, USA
| | - John C Byrd
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Natarajan Muthusamy
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Ralf Bundschuh
- Department of Physics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA. .,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA. .,Department of Chemistry & Biochemistry, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA. .,Center for RNA Biology, The Ohio State University, Columbus, OH, USA.
| | - Pearlly Yan
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA. .,Division of Hematology, Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
20
|
Zhang Y, Wan C, Wang P, Chang W, Huo Y, Chen J, Ma Q, Cao S, Zhang C. M3S: a comprehensive model selection for multi-modal single-cell RNA sequencing data. BMC Bioinformatics 2019; 20:672. [PMID: 31861972 PMCID: PMC6923906 DOI: 10.1186/s12859-019-3243-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Various statistical models have been developed to model the single cell RNA-seq expression profiles, capture its multimodality, and conduct differential gene expression test. However, for expression data generated by different experimental design and platforms, there is currently lack of capability to determine the most proper statistical model. Results We developed an R package, namely Multi-Modal Model Selection (M3S), for gene-wise selection of the most proper multi-modality statistical model and downstream analysis, useful in a single-cell or large scale bulk tissue transcriptomic data. M3S is featured with (1) gene-wise selection of the most parsimonious model among 11 most commonly utilized ones, that can best fit the expression distribution of the gene, (2) parameter estimation of a selected model, and (3) differential gene expression test based on the selected model. Conclusion A comprehensive evaluation suggested that M3S can accurately capture the multimodality on simulated and real single cell data. An open source package and is available through GitHub at https://github.com/zy26/M3S.
Collapse
Affiliation(s)
- Yu Zhang
- MOE Key Laboratory of Symbolic Computation and Knowledge Engineering, Colleges of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA
| | - Changlin Wan
- Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA.,Department of Electronic Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Pengcheng Wang
- Department of Computer Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Wennan Chang
- Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA.,Department of Electronic Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Yan Huo
- Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA.,School of Fundamental Sciences, China Medical University, Shenyang, 110122, China
| | - Jian Chen
- Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200082, China
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Sha Cao
- Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA.,Department of Biostatistics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA
| | - Chi Zhang
- Center for Computational Biology and Bioinformatics, Indiana University, School of Medicine, Indianapolis, 46202, IN, USA. .,Department of Electronic Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA. .,Department of Medical and Molecular Genetics, Indianapolis, IN, 46202, USA.
| |
Collapse
|
21
|
Wang T, Li B, Nelson CE, Nabavi S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 2019; 20:40. [PMID: 30658573 PMCID: PMC6339299 DOI: 10.1186/s12859-019-2599-6] [Citation(s) in RCA: 140] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 01/03/2019] [Indexed: 12/16/2022] Open
Abstract
Background The analysis of single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the detection of differentially expressed (DE) genes. scRNAseq data, however, are highly heterogeneous and have a large number of zero counts, which introduces challenges in detecting DE genes. Addressing these challenges requires employing new approaches beyond the conventional ones, which are based on a nonzero difference in average expression. Several methods have been developed for differential gene expression analysis of scRNAseq data. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to evaluate and compare the performance of differential gene expression analysis methods for scRNAseq data. Results In this study, we conducted a comprehensive evaluation of the performance of eleven differential gene expression analysis software tools, which are designed for scRNAseq data or can be applied to them. We used simulated and real data to evaluate the accuracy and precision of detection. Using simulated data, we investigated the effect of sample size on the detection accuracy of the tools. Using real data, we examined the agreement among the tools in identifying DE genes, the run time of the tools, and the biological relevance of the detected DE genes. Conclusions In general, agreement among the tools in calling DE genes is not high. There is a trade-off between true-positive rates and the precision of calling DE genes. Methods with higher true positive rates tend to show low precision due to their introducing false positives, whereas methods with high precision show low true positive rates due to identifying few DE genes. We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data. Data multimodality and abundance of zero read counts are the main characteristics of scRNAseq data, which play important roles in the performance of differential gene expression analysis methods and need to be considered in terms of the development of new methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2599-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tianyu Wang
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Boyang Li
- Department of Molecular & Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Craig E Nelson
- Department of Molecular & Cell Biology, The Institute for Systems Genomics, CLAS, University of Connecticut, Storrs, CT, USA
| | - Sheida Nabavi
- Computer Science and Engineering Department, The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
22
|
Abstract
High throughput techniques such as RNA-seq or microarray analysis have proven to be invaluable for the characterizing of global transcriptional gene activity changes due to external stimuli or diseases. Differential gene expression analysis (DGEA) is the first step in the course of data interpretation, typically producing lists of dozens to thousands of differentially expressed genes. To further guide the interpretation of these lists, different pathway analysis approaches have been developed. These tools typically rely on the classification of genes into sets of genes, such as pathways, based on the interactions between the genes and their function in a common biological process. Regardless of technical differences, these methods do not properly account for cross talk between different pathways and most of the methods rely on binary separation into differentially expressed gene and unaffected genes based on an arbitrarily set
p-value cut-off. To overcome this limitation, we developed a novel approach to identify concertedly modulated sub-graphs in the global cell signaling network, based on the DGEA results of all genes tested. To this end, expression patterns of genes are integrated according to the topology of their interactions and allow potentially to read the flow of information and identify the effectors. The described software, named Modulated Sub-graph Finder (MSF) is freely available at
https://github.com/Modulated-Subgraph-Finder/MSF.
Collapse
Affiliation(s)
- Mariam R Farman
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria
| | - Fabian Amman
- Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria.,Department of Chromosome Biology, Max F. Perutz Laboratories,, University of Vienna, Vienna, 1030, Austria
| |
Collapse
|
23
|
Wang T, Nabavi S. SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data. Methods 2018; 145:25-32. [PMID: 29702224 DOI: 10.1016/j.ymeth.2018.04.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 04/13/2018] [Accepted: 04/19/2018] [Indexed: 10/17/2022] Open
Abstract
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity.
Collapse
Affiliation(s)
- Tianyu Wang
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
| | - Sheida Nabavi
- Computer Science and Engineering Department and Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
24
|
Abstract
Since the next-generation sequencing (NGS) systems were invented and introduced to life science research about a decade ago, the NGS technology has extensively utilized in wide range of genomic, transcriptomic, and evolutionary studies. Compared with other eukaryotic species, the application of NGS technology in plant research reveals some challenges in sample preparation and data analysis due to some structural and physiological characteristics and genome complexity nature in plants. Hence, despite of the standard sample preparation and data process protocols widely used in high throughput transcriptomic analysis, we also describe the modified hot borate RNA extraction protocol specific for high quality and quantity plant total RNA isolation, and some comments and suggestions to achieve better assessments in the validation of RNA and library quality and data analysis.
Collapse
|
25
|
Abstract
Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.
Collapse
Affiliation(s)
- Tsung Fei Khang
- Institute of Mathematical Sciences, University of Malaya , Kuala Lumpur , Malaysia
| | - Ching Yee Lau
- Institute of Biological Sciences, University of Malaya , Kuala Lumpur , Malaysia
| |
Collapse
|
26
|
Maza E, Frasse P, Senin P, Bouzayen M, Zouine M. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes. Commun Integr Biol 2013; 6:e25849. [PMID: 26442135 PMCID: PMC3918003 DOI: 10.4161/cib.25849] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/22/2013] [Accepted: 07/22/2013] [Indexed: 11/19/2022] Open
Abstract
In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods.
Collapse
Affiliation(s)
- Elie Maza
- Université de Toulouse; INP-ENSA Toulouse; Laboratoire Génomique et Biotechnologie des Fruits; Castanet-Tolosan, France ; INRA; Laboratoire Génomique et Biotechnologie des Fruits; Auzeville, Castanet-Tolosan, France
| | - Pierre Frasse
- Université de Toulouse; INP-ENSA Toulouse; Laboratoire Génomique et Biotechnologie des Fruits; Castanet-Tolosan, France ; INRA; Laboratoire Génomique et Biotechnologie des Fruits; Auzeville, Castanet-Tolosan, France
| | - Pavel Senin
- Université de Toulouse; INP-ENSA Toulouse; Laboratoire Génomique et Biotechnologie des Fruits; Castanet-Tolosan, France ; INRA; Laboratoire Génomique et Biotechnologie des Fruits; Auzeville, Castanet-Tolosan, France
| | - Mondher Bouzayen
- Université de Toulouse; INP-ENSA Toulouse; Laboratoire Génomique et Biotechnologie des Fruits; Castanet-Tolosan, France ; INRA; Laboratoire Génomique et Biotechnologie des Fruits; Auzeville, Castanet-Tolosan, France
| | - Mohamed Zouine
- Université de Toulouse; INP-ENSA Toulouse; Laboratoire Génomique et Biotechnologie des Fruits; Castanet-Tolosan, France ; INRA; Laboratoire Génomique et Biotechnologie des Fruits; Auzeville, Castanet-Tolosan, France
| |
Collapse
|