1
|
Identification of Key Gene Network Modules and Hub Genes Associated with Wheat Response to Biotic Stress Using Combined Microarray Meta-analysis and WGCN Analysis. Mol Biotechnol 2023; 65:453-465. [PMID: 35996047 DOI: 10.1007/s12033-022-00541-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 07/05/2022] [Indexed: 12/31/2022]
Abstract
Wheat (Triticum aestivum) is one of the major crops worldwide and a primary source of calories for human food. Biotic stresses such as fungi, bacteria, and diseases limit wheat production. Although plant breeding and genetic engineering for biotic stress resistance have been suggested as promising solutions to handle losses caused by biotic stress factors, a comprehensive understanding of molecular mechanisms and identifying key genes is a critical step to obtaining success. Here, a network-based meta-analysis approach based on two main statistical methods was used to identify key genes and molecular mechanisms of the wheat response to biotic stress. A total of 163 samples (21,792 genes) from 10 datasets were analyzed. Fisher Z test based on the p-value and REM method based on effect size resulted in 533 differentially expressed genes (p < 0.001 and FDR < 0.001). WGCNA analysis using a dynamic tree-cutting algorithm was used to construct a co-expression network and three significant modules were detected. The modules were significantly enriched by 16 BP terms and 4 KEGG pathways (Benjamini-Hochberg FDR < 0.001). A total of nine hub genes (a top 1.5% of genes with the highest degree) were identified from the constructed network. The identification of DE genes, gene-gene co-expressing network, and hub genes may contribute to uncovering the molecular mechanisms of the wheat response to biotic stress.
Collapse
|
2
|
Approaches in Gene Coexpression Analysis in Eukaryotes. BIOLOGY 2022; 11:biology11071019. [PMID: 36101400 PMCID: PMC9312353 DOI: 10.3390/biology11071019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 06/28/2022] [Accepted: 07/04/2022] [Indexed: 11/22/2022]
Abstract
Simple Summary Genes whose expression levels rise and fall similarly in a large set of samples, may be considered coexpressed. Gene coexpression analysis refers to the en masse discovery of coexpressed genes from a large variety of transcriptomic experiments. The type of biological networks that studies gene coexpression, known as Gene Coexpression Networks, consist of an undirected graph depicting genes and their coexpression relationships. Coexpressed genes are clustered in smaller subnetworks, the predominant biological roles of which can be determined through enrichment analysis. By studying well-annotated gene partners, the attribution of new roles to genes of unknown function or assumption for participation in common metabolic pathways can be achieved, through a guilt-by-association approach. In this review, we present key issues in gene coexpression analysis, as well as the most popular tools that perform it. Abstract Gene coexpression analysis constitutes a widely used practice for gene partner identification and gene function prediction, consisting of many intricate procedures. The analysis begins with the collection of primary transcriptomic data and their preprocessing, continues with the calculation of the similarity between genes based on their expression values in the selected sample dataset and results in the construction and visualisation of a gene coexpression network (GCN) and its evaluation using biological term enrichment analysis. As gene coexpression analysis has been studied extensively, we present most parts of the methodology in a clear manner and the reasoning behind the selection of some of the techniques. In this review, we offer a comprehensive and comprehensible account of the steps required for performing a complete gene coexpression analysis in eukaryotic organisms. We comment on the use of RNA-Seq vs. microarrays, as well as the best practices for GCN construction. Furthermore, we recount the most popular webtools and standalone applications performing gene coexpression analysis, with details on their methods, features and outputs.
Collapse
|
3
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 471] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
4
|
Zhang L, Gerson L, Maluf-Filho F. Systematic review and meta-analysis in GI endoscopy: Why do we need them? How can we read them? Should we trust them? Gastrointest Endosc 2018; 88:139-150. [PMID: 29526656 DOI: 10.1016/j.gie.2018.03.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 03/02/2018] [Indexed: 02/08/2023]
Affiliation(s)
- Lanjing Zhang
- Department of Pathology, University Medical Center of Princeton, Plainsboro, New Jersey, USA; Department of Biological Sciences, Rutgers University, Newark, New Jersey, USA; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey, USA; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey, USA
| | - Lauren Gerson
- California Pacific Medical Center, San Francisco, California, USA
| | - Fauze Maluf-Filho
- Department of Gastroenterology of University of São Paulo, Institute of Cancer of University of São Paulo (ICESP-FMUSP), São Paulo, Brazil
| |
Collapse
|
5
|
Kawalia SB, Raschka T, Naz M, de Matos Simoes R, Senger P, Hofmann-Apitius M. Analytical Strategy to Prioritize Alzheimer's Disease Candidate Genes in Gene Regulatory Networks Using Public Expression Data. J Alzheimers Dis 2018; 59:1237-1254. [PMID: 28800327 PMCID: PMC5611835 DOI: 10.3233/jad-170011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Alzheimer’s disease (AD) progressively destroys cognitive abilities in the aging population with tremendous effects on memory. Despite recent progress in understanding the underlying mechanisms, high drug attrition rates have put a question mark behind our knowledge about its etiology. Re-evaluation of past studies could help us to elucidate molecular-level details of this disease. Several methods to infer such networks exist, but most of them do not elaborate on context specificity and completeness of the generated networks, missing out on lesser-known candidates. In this study, we present a novel strategy that corroborates common mechanistic patterns across large scale AD gene expression studies and further prioritizes potential biomarker candidates. To infer gene regulatory networks (GRNs), we applied an optimized version of the BC3Net algorithm, named BC3Net10, capable of deriving robust and coherent patterns. In principle, this approach initially leverages the power of literature knowledge to extract AD specific genes for generating viable networks. Our findings suggest that AD GRNs show significant enrichment for key signaling mechanisms involved in neurotransmission. Among the prioritized genes, well-known AD genes were prominent in synaptic transmission, implicated in cognitive deficits. Moreover, less intensive studied AD candidates (STX2, HLA-F, HLA-C, RAB11FIP4, ARAP3, AP2A2, ATP2B4, ITPR2, and ATP2A3) are also involved in neurotransmission, providing new insights into the underlying mechanism. To our knowledge, this is the first study to generate knowledge-instructed GRNs that demonstrates an effective way of combining literature-based knowledge and data-driven analysis to identify lesser known candidates embedded in stable and robust functional patterns across disparate datasets.
Collapse
Affiliation(s)
- Shweta Bagewadi Kawalia
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | - Tamara Raschka
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,University of Applied Sciences Koblenz, RheinAhrCampus, Remagen, Germany
| | - Mufassra Naz
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | | | - Philipp Senger
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| |
Collapse
|
6
|
de Abreu Neto JB, Frei M. Microarray Meta-Analysis Focused on the Response of Genes Involved in Redox Homeostasis to Diverse Abiotic Stresses in Rice. FRONTIERS IN PLANT SCIENCE 2015; 6:1260. [PMID: 26793229 PMCID: PMC4709464 DOI: 10.3389/fpls.2015.01260] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 12/24/2015] [Indexed: 05/11/2023]
Abstract
Plants are exposed to a wide range of abiotic stresses (AS), which often occur in combination. Because physiological investigations typically focus on one stress, our understanding of unspecific stress responses remains limited. The plant redox homeostasis, i.e., the production and removal of reactive oxygen species (ROS), may be involved in many environmental stress conditions. Therefore, this study intended to identify genes, which are activated in diverse AS, focusing on ROS-related pathways. We conducted a meta-analysis (MA) of microarray experiments, focusing on rice. Transcriptome data were mined from public databases and fellow researchers, which represented 36 different experiments and investigated diverse AS, including ozone stress, drought, heat, cold, salinity, and mineral deficiencies/toxicities. To overcome the inherent artifacts of different MA methods, data were processed using Fisher, rOP, REM, and product of rank (GeneSelector), and genes identified by most approaches were considered as shared differentially expressed genes (DEGs). Two MA strategies were adopted: first, datasets were separated into shoot, root, and seedling experiments, and these tissues were analyzed separately to identify shared DEGs. Second, shoot and seedling experiments were classed into oxidative stress (OS), i.e., ozone and hydrogen peroxide treatments directly producing ROS in plant tissue, and other AS, in which ROS production is indirect. In all tissues and stress conditions, genes a priori considered as ROS-related were overrepresented among the DEGs, as they represented 4% of all expressed genes but 7-10% of the DEGs. The combined MA approach was substantially more conservative than individual MA methods and identified 1001 shared DEGs in shoots, 837 shared DEGs in root, and 1172 shared DEGs in seedlings. Within the OS and AS groups, 990 and 1727 shared DEGs were identified, respectively. In total, 311 genes were shared between OS and AS, including many regulatory genes. Combined co-expression analysis identified among those a cluster of 42 genes, many involved in the photosynthetic apparatus and responsive to drought, iron deficiency, arsenic toxicity, and ozone. Our data demonstrate the importance of redox homeostasis in plant stress responses and the power of MA to identify candidate genes underlying unspecific signaling pathways.
Collapse
|
7
|
Shi X, Yi H, Ma S. Measures for the degree of overlap of gene signatures and applications to TCGA. Brief Bioinform 2014; 16:735-44. [PMID: 25552438 DOI: 10.1093/bib/bbu049] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Indexed: 11/12/2022] Open
Abstract
For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.
Collapse
|
8
|
Marakhonov A, Sadovskaya N, Antonov I, Baranova A, Skoblov M. Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization. BMC Genomics 2014; 15 Suppl 12:S8. [PMID: 25563078 PMCID: PMC4303952 DOI: 10.1186/1471-2164-15-s12-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Affymetrix microarray technology allows one to investigate expression of thousands of genes simultaneously upon a variety of conditions. In a popular U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discordant expression observed for two different probesets that match the same gene is a widespread phenomenon which is usually underestimated, ignored or disregarded. Results Here we evaluate the prevalence of discordant expression in data collected using Affymetrix HG-U133A microarray platform. In U133A, about 30% of genes annotated by two different probesets demonstrate a substantial correlation between independently measured expression values. To our surprise, sorting the probesets according to the nature of the discrepancy in their expression levels allowed the classification of the respective genes according to their fundamental functional properties, including observed enrichment by tissue-specific transcripts and alternatively spliced variants. On another hand, an absence of discrepancies in probesets that simultaneously match several different genes allowed us to pinpoint non-expressed pseudogenes and gene groups with highly correlated expression patterns. Nevertheless, in many cases, the nature of discordant expression of two probesets that match the same transcript remains unexplained. It is possible that these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene. Conclusion The majority of absolute gene expression values collected using Affymetrix microarrays may not be suitable for typical interpretative downstream analysis.
Collapse
|