1
|
Zheng X, Lim PK, Mutwil M, Wang Y. A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering. BMC PLANT BIOLOGY 2024; 24:373. [PMID: 38714965 PMCID: PMC11077725 DOI: 10.1186/s12870-024-05086-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 04/30/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research. RESULTS Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress. CONCLUSIONS To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.
Collapse
Affiliation(s)
- Xinghai Zheng
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Yuefei Wang
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
2
|
Orduña L, Santiago A, Navarro-Payá D, Zhang C, Wong DCJ, Matus JT. Aggregated gene co-expression networks predict transcription factor regulatory landscapes in grapevine. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:6522-6540. [PMID: 37668374 DOI: 10.1093/jxb/erad344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/30/2023] [Indexed: 09/06/2023]
Abstract
Gene co-expression networks (GCNs) have not been extensively studied in non-model plants. However, the rapid accumulation of transcriptome datasets in certain species represents an opportunity to explore underutilized network aggregation approaches. In fact, aggregated GCNs (aggGCNs) highlight robust co-expression interactions and improve functional connectivity. We applied and evaluated two different aggregation methods on public grapevine RNA-Seq datasets from three different tissues (leaf, berry, and 'all organs'). Our results show that co-occurrence-based aggregation generally yielded the best-performing networks. We applied aggGCNs to study several transcription factor gene families, showing their capacity for detecting both already-described and novel regulatory relationships between R2R3-MYBs, bHLH/MYC, and multiple specialized metabolic pathways. Specifically, transcription factor gene- and pathway-centered network analyses successfully ascertained the previously established role of VviMYBPA1 in controlling the accumulation of proanthocyanidins while providing insights into its novel role as a regulator of p-coumaroyl-CoA biosynthesis as well as the shikimate and aromatic amino acid pathways. This network was validated using DNA affinity purification sequencing data, demonstrating that co-expression networks of transcriptional activators can serve as a proxy of gene regulatory networks. This study presents an open repository to reproduce networks in other crops and a GCN application within the Vitviz platform, a user-friendly tool for exploring co-expression relationships.
Collapse
Affiliation(s)
- Luis Orduña
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| | - Antonio Santiago
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| | - David Navarro-Payá
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| | - Chen Zhang
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| | - Darren C J Wong
- Ecology and Evolution, Research School of Biology, The Australian National University, Acton, Australia
| | - José Tomás Matus
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| |
Collapse
|
3
|
Latapiat V, Saez M, Pedroso I, Martin AJM. Unraveling patient heterogeneity in complex diseases through individualized co-expression networks: a perspective. Front Genet 2023; 14:1209416. [PMID: 37636264 PMCID: PMC10449456 DOI: 10.3389/fgene.2023.1209416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023] Open
Abstract
This perspective highlights the potential of individualized networks as a novel strategy for studying complex diseases through patient stratification, enabling advancements in precision medicine. We emphasize the impact of interpatient heterogeneity resulting from genetic and environmental factors and discuss how individualized networks improve our ability to develop treatments and enhance diagnostics. Integrating system biology, combining multimodal information such as genomic and clinical data has reached a tipping point, allowing the inference of biological networks at a single-individual resolution. This approach generates a specific biological network per sample, representing the individual from which the sample originated. The availability of individualized networks enables applications in personalized medicine, such as identifying malfunctions and selecting tailored treatments. In essence, reliable, individualized networks can expedite research progress in understanding drug response variability by modeling heterogeneity among individuals and enabling the personalized selection of pharmacological targets for treatment. Therefore, developing diverse and cost-effective approaches for generating these networks is crucial for widespread application in clinical services.
Collapse
Affiliation(s)
- Verónica Latapiat
- Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
| | - Mauricio Saez
- Centro de Oncología de Precisión, Facultad de Medicina y Ciencias de la Salud, Universidad Mayor, Santiago, Chile
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco, Chile
| | - Inti Pedroso
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
| | - Alberto J. M. Martin
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
- Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
4
|
Ribone AI, Fass M, Gonzalez S, Lia V, Paniego N, Rivarola M. Co-Expression Networks in Sunflower: Harnessing the Power of Multi-Study Transcriptomic Public Data to Identify and Categorize Candidate Genes for Fungal Resistance. PLANTS (BASEL, SWITZERLAND) 2023; 12:2767. [PMID: 37570920 PMCID: PMC10421300 DOI: 10.3390/plants12152767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/19/2023] [Accepted: 07/21/2023] [Indexed: 08/13/2023]
Abstract
Fungal plant diseases are a major threat to food security worldwide. Current efforts to identify and list loci involved in different biological processes are more complicated than originally thought, even when complete genome assemblies are available. Despite numerous experimental and computational efforts to characterize gene functions in plants, about ~40% of protein-coding genes in the model plant Arabidopsis thaliana L. are still not categorized in the Gene Ontology (GO) Biological Process (BP) annotation. In non-model organisms, such as sunflower (Helianthus annuus L.), the number of BP term annotations is far fewer, ~22%. In the current study, we performed gene co-expression network analysis using eight terabytes of public transcriptome datasets and expression-based functional prediction to categorize and identify loci involved in the response to fungal pathogens. We were able to construct a reference gene network of healthy green tissue (GreenGCN) and a gene network of healthy and stressed root tissues (RootGCN). Both networks achieved robust, high-quality scores on the metrics of guilt-by-association and selective constraints versus gene connectivity. We were able to identify eight modules enriched in defense functions, of which two out of the three modules in the RootGCN were also conserved in the GreenGCN, suggesting similar defense-related expression patterns. We identified 16 WRKY genes involved in defense related functions and 65 previously uncharacterized loci now linked to defense response. In addition, we identified and classified 122 loci previously identified within QTLs or near candidate loci reported in GWAS studies of disease resistance in sunflower linked to defense response. All in all, we have implemented a valuable strategy to better describe genes within specific biological processes.
Collapse
Affiliation(s)
| | | | | | | | | | - Máximo Rivarola
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), CICVyA—Instituto Nacional de Tecnología Agropecuaria (INTA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Los Reseros y Nicolás Repetto, Hurlingham 1686, Argentina; (A.I.R.); (M.F.); (S.G.); (V.L.); (N.P.)
| |
Collapse
|
5
|
Choi Y, Li R, Quon G. siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol 2023; 24:29. [PMID: 36803416 PMCID: PMC9940350 DOI: 10.1186/s13059-023-02850-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 01/06/2023] [Indexed: 02/22/2023] Open
Abstract
Neural networks such as variational autoencoders (VAE) perform dimensionality reduction for the visualization and analysis of genomic data, but are limited in their interpretability: it is unknown which data features are represented by each embedding dimension. We present siVAE, a VAE that is interpretable by design, thereby enhancing downstream analysis tasks. Through interpretation, siVAE also identifies gene modules and hubs without explicit gene network inference. We use siVAE to identify gene modules whose connectivity is associated with diverse phenotypes such as iPSC neuronal differentiation efficiency and dementia, showcasing the wide applicability of interpretable generative models for genomic data analysis.
Collapse
Affiliation(s)
- Yongin Choi
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Ruoxin Li
- Genome Center, University of California, Davis, Davis, CA, USA
- Graduate Group in Biostatistics, University of California, Davis, Davis, CA, USA
| | - Gerald Quon
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA.
- Genome Center, University of California, Davis, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
6
|
Raina P, Guinea R, Chatsirisupachai K, Lopes I, Farooq Z, Guinea C, Solyom CA, de Magalhães JP. GeneFriends: gene co-expression databases and tools for humans and model organisms. Nucleic Acids Res 2022; 51:D145-D158. [PMID: 36454018 PMCID: PMC9825523 DOI: 10.1093/nar/gkac1031] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 12/05/2022] Open
Abstract
Gene co-expression analysis has emerged as a powerful method to provide insights into gene function and regulation. The rapid growth of publicly available RNA-sequencing (RNA-seq) data has created opportunities for researchers to employ this abundant data to help decipher the complexity and biology of genomes. Co-expression networks have proven effective for inferring the relationship between the genes, for gene prioritization and for assigning function to poorly annotated genes based on their co-expressed partners. To facilitate such analyses we created previously an online co-expression tool for humans and mice entitled GeneFriends. To continue providing a valuable tool to the scientific community, we have now updated the GeneFriends database and website. Here, we present the new version of GeneFriends, which includes gene and transcript co-expression networks based on RNA-seq data from 46 475 human and 34 322 mouse samples. The new database also encompasses tissue-specific gene co-expression networks for 20 human and 21 mouse tissues, dataset-specific gene co-expression maps based on TCGA and GTEx projects and gene co-expression networks for additional seven model organisms (fruit fly, zebrafish, worm, rat, yeast, cow and chicken). GeneFriends is freely available at http://www.genefriends.org/.
Collapse
Affiliation(s)
- Priyanka Raina
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Rodrigo Guinea
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Kasit Chatsirisupachai
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Inês Lopes
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Zoya Farooq
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Cristina Guinea
- UCAL - Universidad de Ciencias y Artes de América Latina, Faculty of Design, Lima 15026, Perú
| | - Csaba-Attila Solyom
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | | |
Collapse
|
7
|
Obayashi T, Kodate S, Hibara H, Kagaya Y, Kinoshita K. COXPRESdb v8: an animal gene coexpression database navigating from a global view to detailed investigations. Nucleic Acids Res 2022; 51:D80-D87. [PMID: 36350658 PMCID: PMC9825429 DOI: 10.1093/nar/gkac983] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/12/2022] [Accepted: 10/15/2022] [Indexed: 11/10/2022] Open
Abstract
Gene coexpression is synchronization of gene expression across many cellular and environmental conditions and is widely used to infer the biological function of genes. Gene coexpression information is complex, comprising a complete graph of all genes in the genome, and requires appropriate visualization and analysis tools. Since its initial release in 2007, the animal gene expression database COXPRESdb (https://coxpresdb.jp) has been continuously improved by adding new gene coexpression data and analysis tools. Here, we report COXPRESdb version 8, which has been enhanced with new features for an overview, summary, and individual examination of coexpression relationships: CoexMap to display coexpression on a genome scale, pathway enrichment analysis to summarize the function of coexpressed genes, and CoexPub to bridges coexpression and existing knowledge. COXPRESdb also facilitates downstream analyses such as interspecies comparisons by integrating RNAseq and microarray coexpression data in a union-type gene coexpression. COXPRESdb strongly support users with the new coexpression data and enhanced functionality.
Collapse
Affiliation(s)
- Takeshi Obayashi
- To whom correspondence should be addressed. Tel: +81 22 795 4741; Fax: +81 22 795 4765;
| | - Shun Kodate
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, 980-8573, Japan
| | - Himiko Hibara
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679, Japan
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | | |
Collapse
|
8
|
Zhang Y, Nie H, Yin Z, Yan X. Comparative transcriptomic analysis revealed dynamic changes of distinct classes of genes during development of the Manila clam (Ruditapes philippinarum). BMC Genomics 2022; 23:676. [PMID: 36175832 PMCID: PMC9524096 DOI: 10.1186/s12864-022-08813-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 07/28/2022] [Indexed: 11/10/2022] Open
Abstract
Background The Manila clam Ruditapesphilippinarum is one of the most economically important marine shellfish. However, the molecular mechanisms of early development in Manila clams are largely unknown. In this study, we collected samples from 13 stages of early development in Manila clam and compared the mRNA expression pattern between samples by RNA-seq techniques. Results We applied RNA-seq technology to 13 embryonic and larval stages of the Manila clam to identify critical genes and pathways involved in their development and biological characteristics. Important genes associated with different morphologies during the early fertilized egg, cell division, cell differentiation, hatching, and metamorphosis stages were identified. We detected the highest number of differentially expressed genes in the comparison of the pediveliger and single pipe juvenile stages, which is a time when biological characteristics greatly change during metamorphosis. Gene Ontology (GO) enrichment analysis showed that expression levels of microtubule protein-related molecules and Rho genes were upregulated and that GO terms such as ribosome, translation, and organelle were enriched in the early development stages of the Manila clam. Kyoto Encyclopedia of Genes and Genomes pathway analysis showed that the foxo, wnt, and transforming growth factor-beta pathways were significantly enriched during early development. These results provide insights into the molecular mechanisms at work during different periods of early development of Manila clams. Conclusion These transcriptomic data provide clues to the molecular mechanisms underlying the development of Manila clam larvae. These results will help to improve Manila clam reproduction and development. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08813-0.
Collapse
Affiliation(s)
- Yanming Zhang
- College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China.,Engineering Research Center of Shellfish Culture and Breeding in Liaoning Province, College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China
| | - Hongtao Nie
- College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China. .,Engineering Research Center of Shellfish Culture and Breeding in Liaoning Province, College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China.
| | - Zhihui Yin
- College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China.,Engineering Research Center of Shellfish Culture and Breeding in Liaoning Province, College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China
| | - Xiwu Yan
- College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China.,Engineering Research Center of Shellfish Culture and Breeding in Liaoning Province, College of Fisheries and Life Science, Dalian Ocean University, 116023, Dalian, China
| |
Collapse
|
9
|
Obayashi T, Hibara H, Kagaya Y, Aoki Y, Kinoshita K. ATTED-II v11: A Plant Gene Coexpression Database Using a Sample Balancing Technique by Subagging of Principal Components. PLANT & CELL PHYSIOLOGY 2022; 63:869-881. [PMID: 35353884 DOI: 10.1093/pcp/pcac041] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/06/2022] [Accepted: 03/29/2022] [Indexed: 05/25/2023]
Abstract
ATTED-II (https://atted.jp) is a gene coexpression database for nine plant species based on publicly available RNAseq and microarray data. One of the challenges in constructing condition-independent coexpression data based on publicly available gene expression data is managing the inherent sampling bias. Here, we report ATTED-II version 11, wherein we adopted a coexpression calculation methodology to balance the samples using principal component analysis and ensemble calculation. This approach has two advantages. First, omitting principal components with low contribution rates reduces the main contributors of noise. Second, balancing large differences in contribution rates enables considering various sample conditions entirely. In addition, based on RNAseq- and microarray-based coexpression data, we provide species-representative, integrated coexpression information to enhance the efficiency of interspecies comparison of the coexpression data. These coexpression data are provided as a standardized z-score to facilitate integrated analysis with different data sources. We believe that with these improvements, ATTED-II is more valuable and powerful for supporting interspecies comparative studies and integrated analyses using heterogeneous data.
Collapse
Affiliation(s)
- Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Himiko Hibara
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuki Kagaya
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
- Institute of Development, Aging, and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, 980-8575 Japan
| |
Collapse
|
10
|
Arshad Z, McDonald JF. A computational approach to generate highly conserved gene co-expression networks with RNA-seq data. STAR Protoc 2022; 3:101432. [PMID: 35677606 PMCID: PMC9168722 DOI: 10.1016/j.xpro.2022.101432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Affiliation(s)
- Zainab Arshad
- Integrated Cancer Research Center, School of Biological Sciences, Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30619, USA
| | - John F. McDonald
- Integrated Cancer Research Center, School of Biological Sciences, Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30619, USA
- Corresponding author
| |
Collapse
|
11
|
GCEN: An Easy-to-Use Toolkit for Gene Co-Expression Network Analysis and lncRNAs Annotation. Curr Issues Mol Biol 2022; 44:1479-1487. [PMID: 35723358 PMCID: PMC9164028 DOI: 10.3390/cimb44040100] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/13/2022] [Accepted: 03/23/2022] [Indexed: 02/07/2023] Open
Abstract
Gene co-expression network analysis has been widely used in gene function annotation, especially for long noncoding RNAs (lncRNAs). However, there is a lack of effective cross-platform analysis tools. For biologists to easily build a gene co-expression network and to predict gene function, we developed GCEN, a cross-platform command-line toolkit developed with C++. It is an efficient and easy-to-use solution that will allow everyone to perform gene co-expression network analysis without the requirement of sophisticated programming skills, especially in cases of RNA-Seq research and lncRNAs function annotation. Because of its modular design, GCEN can be easily integrated into other pipelines.
Collapse
|
12
|
González-Espinoza A, Zamora-Fuentes J, Hernández-Lemus E, Espinal-Enríquez J. Gene Co-Expression in Breast Cancer: A Matter of Distance. Front Oncol 2021; 11:726493. [PMID: 34868919 PMCID: PMC8636045 DOI: 10.3389/fonc.2021.726493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/26/2021] [Indexed: 01/16/2023] Open
Abstract
Gene regulatory and signaling phenomena are known to be relevant players underlying the establishment of cellular phenotypes. It is also known that such regulatory programs are disrupted in cancer, leading to the onset and development of malignant phenotypes. Gene co-expression matrices have allowed us to compare and analyze complex phenotypes such as breast cancer (BrCa) and their control counterparts. Global co-expression patterns have revealed, for instance, that the highest gene-gene co-expression interactions often occur between genes from the same chromosome (cis-), meanwhile inter-chromosome (trans-) interactions are scarce and have lower correlation values. Furthermore, strength of cis- correlations have been shown to decay with the chromosome distance of gene couples. Despite this loss of long-distance co-expression has been clearly identified, it has been observed only in a small fraction of the whole co-expression landscape, namely the most significant interactions. For that reason, an approach that takes into account the whole interaction set results appealing. In this work, we developed a hybrid method to analyze whole-chromosome Pearson correlation matrices for the four BrCa subtypes (Luminal A, Luminal B, HER2+ and Basal), as well as adjacent normal breast tissue derived matrices. We implemented a systematic method for clustering gene couples, by using eigenvalue spectral decomposition and the k–medoids algorithm, allowing us to determine a number of clusters without removing any interaction. With this method we compared, for each chromosome in the five phenotypes: a) Whether or not the gene-gene co-expression decays with the distance in the breast cancer subtypes b) the chromosome location of cis- clusters of gene couples, and c) whether or not the loss of long-distance co-expression is observed in the whole range of interactions. We found that in the correlation matrix for the control phenotype, positive and negative Pearson correlations deviate from a random null model independently of the distance between couples. Conversely, for all BrCa subtypes, in all chromosomes, positive correlations decay with distance, and negative correlations do not differ from the null model. We also found that BrCa clusters are distance-dependent, meanwhile for the control phenotype, chromosome location does not determine the clustering. To our knowledge, this is the first time that a dependence on distance is reported for gene clusters in breast cancer. Since this method uses the whole cis- interaction geneset, combination with other -omics approaches may provide further evidence to understand in a more integrative fashion, the mechanisms that disrupt gene regulation in cancer.
Collapse
Affiliation(s)
- Alfredo González-Espinoza
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States.,Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Jose Zamora-Fuentes
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autόnoma de México, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autόnoma de México, Mexico City, Mexico
| |
Collapse
|
13
|
Yu H, Wang L, Chen D, Li J, Guo Y. Conditional transcriptional relationships may serve as cancer prognostic markers. BMC Med Genomics 2021; 14:101. [PMID: 34856998 PMCID: PMC8638091 DOI: 10.1186/s12920-021-00958-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 04/08/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND While most differential coexpression (DC) methods are bound to quantify a single correlation value for a gene pair across multiple samples, a newly devised approach under the name Correlation by Individual Level Product (CILP) revolutionarily projects the summary correlation value to individual product correlation values for separate samples. CILP greatly widened DC analysis opportunities by allowing integration of non-compromised statistical methods. METHODS Here, we performed a study to verify our hypothesis that conditional relationships, i.e., gene pairs of remarkable differential coexpression, may be sought as quantitative prognostic markers for human cancers. Alongside the seeking of prognostic gene links in a pan-cancer setting, we also examined whether a trend of global expression correlation loss appeared in a wide panel of cancer types and revisited the controversial subject of mutual relationship between the DE approach and the DC approach. RESULTS By integrating CILP with classical univariate survival analysis, we identified up to 244 conditional gene links as potential prognostic markers in five cancer types. In particular, five prognostic gene links for kidney renal papillary cell carcinoma tended to condense around cancer gene ESPL1, and the transcriptional synchrony between ESPL1 and PTTG1 tended to be elevated in patients of adverse prognosis. In addition, we extended the observation of global trend of correlation loss in more than ten cancer types and empirically proved DC analysis results were independent of gene differential expression in five cancer types. CONCLUSIONS Combining the power of CILP and the classical survival analysis, we successfully fetched conditional transcriptional relationships that conferred prognosis power for five cancer types. Despite a general trend of global correlation loss in tumor transcriptomes, most of these prognosis conditional links demonstrated stronger expression correlation in tumors, and their stronger coexpression was associated with poor survival.
Collapse
Affiliation(s)
- Hui Yu
- Department of Internal Medicine, University of New Mexico, Albuquerque, NM, 87131, USA.
| | - Limei Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Kaikou, Hainan, 571199, China.,College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Danqian Chen
- Key Laboratory of Resource Biology and Biotechnology in Western China, School of Life Sciences, Northwest University, Xi'an, 710069, Shaanxi, China
| | - Jin Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Kaikou, Hainan, 571199, China
| | - Yan Guo
- Department of Internal Medicine, University of New Mexico, Albuquerque, NM, 87131, USA.
| |
Collapse
|
14
|
Burns JJR, Shealy BT, Greer MS, Hadish JA, McGowan MT, Biggs T, Smith MC, Feltus FA, Ficklin SP. Addressing noise in co-expression network construction. Brief Bioinform 2021; 23:6446269. [PMID: 34850822 PMCID: PMC8769892 DOI: 10.1093/bib/bbab495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Collapse
Affiliation(s)
- Joshua J R Burns
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Benjamin T Shealy
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - Mitchell S Greer
- School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| | - John A Hadish
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Matthew T McGowan
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Tyler Biggs
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Melissa C Smith
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, 130 McGinty Court. Clemson University, Clemson, SC 29634. USA.,Biomedical Data Science & Informatics Program, 100 McAdams Hall. Clemson University, Clemson, SC 29634. USA.,Clemson Center for Human Genetics, 114 Gregor Mendel Circle, Greenwood, SC 29646. USA
| | - Stephen P Ficklin
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA.,School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| |
Collapse
|
15
|
Kuang J, Scoglio C. Layer reconstruction and missing link prediction of a multilayer network with maximum a posteriori estimation. Phys Rev E 2021; 104:024301. [PMID: 34525660 PMCID: PMC8445383 DOI: 10.1103/physreve.104.024301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 07/16/2021] [Indexed: 04/23/2023]
Abstract
From social networks to biological networks, different types of interactions among the same set of nodes characterize distinct layers, which are termed multilayer networks. Within a multilayer network, some layers, confirmed through different experiments, could be structurally similar and interdependent. In this paper, we propose a maximum a posteriori-based method to study and reconstruct the structure of a target layer in a multilayer network. Nodes within the target layer are characterized by vectors, which are employed to compute edge weights. Further, to detect structurally similar layers, we propose a method for comparing networks based on the eigenvector centrality. Using similar layers, we obtain the parameters of the conjugate prior. With this maximum a posteriori algorithm, we can reconstruct the target layer and predict missing links. We test the method on two real multilayer networks, and the results show that the maximum a posteriori estimation is promising in reconstructing the target layer even when a large number of links is missing.
Collapse
Affiliation(s)
- Junyao Kuang
- Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS 66506, USA
| | - Caterina Scoglio
- Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS 66506, USA
| |
Collapse
|
16
|
Lemoine GG, Scott-Boyer MP, Ambroise B, Périn O, Droit A. GWENA: gene co-expression networks analysis and extended modules characterization in a single Bioconductor package. BMC Bioinformatics 2021; 22:267. [PMID: 34034647 PMCID: PMC8152313 DOI: 10.1186/s12859-021-04179-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 05/07/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Network-based analysis of gene expression through co-expression networks can be used to investigate modular relationships occurring between genes performing different biological functions. An extended description of each of the network modules is therefore a critical step to understand the underlying processes contributing to a disease or a phenotype. Biological integration, topology study and conditions comparison (e.g. wild vs mutant) are the main methods to do so, but to date no tool combines them all into a single pipeline. RESULTS Here we present GWENA, a new R package that integrates gene co-expression network construction and whole characterization of the detected modules through gene set enrichment, phenotypic association, hub genes detection, topological metric computation, and differential co-expression. To demonstrate its performance, we applied GWENA on two skeletal muscle datasets from young and old patients of GTEx study. Remarkably, we prioritized a gene whose involvement was unknown in the muscle development and growth. Moreover, new insights on the variations in patterns of co-expression were identified. The known phenomena of connectivity loss associated with aging was found coupled to a global reorganization of the relationships leading to expression of known aging related functions. CONCLUSION GWENA is an R package available through Bioconductor ( https://bioconductor.org/packages/release/bioc/html/GWENA.html ) that has been developed to perform extended analysis of gene co-expression networks. Thanks to biological and topological information as well as differential co-expression, the package helps to dissect the role of genes relationships in diseases conditions or targeted phenotypes. GWENA goes beyond existing packages that perform co-expression analysis by including new tools to fully characterize modules, such as differential co-expression, additional enrichment databases, and network visualization.
Collapse
Affiliation(s)
- Gwenaëlle G. Lemoine
- Département de médecine moléculaire, Faculté de médecine, Université Laval, 2325 rue de l’Université, Québec, G1V 0A6 Canada
| | - Marie-Pier Scott-Boyer
- Centre de recherche du Chu de Quebec-Université Laval, 2705 boulevard Laurier Québec, Québec, G1V 4G2 Canada
| | - Bathilde Ambroise
- L’Oréal Research and Innovation, 15 rue Pierre Dreyfus, 92110 Clichy, France
| | - Olivier Périn
- L’Oréal Research and Innovation, 15 rue Pierre Dreyfus, 92110 Clichy, France
| | - Arnaud Droit
- Département de médecine moléculaire, Faculté de médecine, Université Laval, 2325 rue de l’Université, Québec, G1V 0A6 Canada
- Centre de recherche du Chu de Quebec-Université Laval, 2705 boulevard Laurier Québec, Québec, G1V 4G2 Canada
| |
Collapse
|
17
|
Emad A, Sinha S. Inference of phenotype-relevant transcriptional regulatory networks elucidates cancer type-specific regulatory mechanisms in a pan-cancer study. NPJ Syst Biol Appl 2021; 7:9. [PMID: 33558504 PMCID: PMC7870953 DOI: 10.1038/s41540-021-00169-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 01/05/2021] [Indexed: 01/30/2023] Open
Abstract
Reconstruction of transcriptional regulatory networks (TRNs) is a powerful approach to unravel the gene expression programs involved in healthy and disease states of a cell. However, these networks are usually reconstructed independent of the phenotypic (or clinical) properties of the samples. Therefore, they may confound regulatory mechanisms that are specifically related to a phenotypic property with more general mechanisms underlying the full complement of the analyzed samples. In this study, we develop a method called InPheRNo to identify "phenotype-relevant" TRNs. This method is based on a probabilistic graphical model that models the simultaneous effects of multiple transcription factors (TFs) on their target genes and the statistical relationship between the target genes' expression and the phenotype. Extensive comparison of InPheRNo with related approaches using primary tumor samples of 18 cancer types from The Cancer Genome Atlas reveals that InPheRNo can accurately reconstruct cancer type-relevant TRNs and identify cancer driver TFs. In addition, survival analysis reveals that the activity level of TFs with many target genes could distinguish patients with poor prognosis from those with better prognosis.
Collapse
Affiliation(s)
- Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
18
|
Selmansberger M, Michna A, Braselmann H, Höfig I, Schorpp K, Weber P, Anastasov N, Zitzelsberger H, Hess J, Unger K. Transcriptome network of the papillary thyroid carcinoma radiation marker CLIP2. Radiat Oncol 2020; 15:182. [PMID: 32727620 PMCID: PMC7392692 DOI: 10.1186/s13014-020-01620-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 07/15/2020] [Indexed: 11/29/2022] Open
Abstract
Background We present a functional gene association network of the CLIP2 gene, generated by de-novo reconstruction from transcriptomic microarray data. CLIP2 was previously identified as a potential marker for radiation induced papillary thyroid carcinoma (PTC) of young patients in the aftermath of the Chernobyl reactor accident. Considering the rising thyroid cancer incidence rates in western societies, potentially related to medical radiation exposure, the functional characterization of CLIP2 is of relevance and contributes to the knowledge about radiation-induced thyroid malignancies. Methods We generated a transcriptomic mRNA expression data set from a CLIP2-perturbed thyroid cancer cell line (TPC-1) with induced CLIP2 mRNA overexpression and siRNA knockdown, respectively, followed by gene-association network reconstruction using the partial correlation-based approach GeneNet. Furthermore, we investigated different approaches for prioritizing differentially expressed genes for network reconstruction and compared the resulting networks with existing functional interaction networks from the Reactome, Biogrid and STRING databases. The derived CLIP2 interaction partners were validated on transcript and protein level. Results The best reconstructed network with regard to selection parameters contained a set of 20 genes in the 1st neighborhood of CLIP2 and suggests involvement of CLIP2 in the biological processes DNA repair/maintenance, chromosomal instability, promotion of proliferation and metastasis. Peptidylprolyl Isomerase Like 3 (PPIL3), previously identified as a potential direct interaction partner of CLIP2, was confirmed in this study by co-expression at the transcript and protein level. Conclusion In our study we present an optimized preselection approach for genes subjected to gene-association network reconstruction, which was applied to CLIP2 perturbation transcriptome data of a thyroid cancer cell culture model. Our data support the potential carcinogenic role of CLIP2 overexpression in radiation-induced PTC and further suggest potential interaction partners of the gene.
Collapse
Affiliation(s)
- Martin Selmansberger
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Agata Michna
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Herbert Braselmann
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Ines Höfig
- Institute of Radiation Biology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Kenji Schorpp
- Institute for Molecular Toxicology and Pharmacology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Peter Weber
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Natasa Anastasov
- Institute of Radiation Biology, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Horst Zitzelsberger
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Munich, Germany.,Clinical Cooperation Group 'Personalized Radiotherapy in Head and Neck Cancer', Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Julia Hess
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany.,Department of Radiation Oncology, University Hospital, LMU Munich, Munich, Germany.,Clinical Cooperation Group 'Personalized Radiotherapy in Head and Neck Cancer', Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany
| | - Kristian Unger
- Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany. .,Department of Radiation Oncology, University Hospital, LMU Munich, Munich, Germany. .,Clinical Cooperation Group 'Personalized Radiotherapy in Head and Neck Cancer', Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764, Neuherberg, Germany.
| |
Collapse
|
19
|
Randhawa V, Pathania S. Advancing from protein interactomes and gene co-expression networks towards multi-omics-based composite networks: approaches for predicting and extracting biological knowledge. Brief Funct Genomics 2020; 19:364-376. [PMID: 32678894 DOI: 10.1093/bfgp/elaa015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/31/2020] [Accepted: 06/15/2020] [Indexed: 01/17/2023] Open
Abstract
Prediction of biological interaction networks from single-omics data has been extensively implemented to understand various aspects of biological systems. However, more recently, there is a growing interest in integrating multi-omics datasets for the prediction of interactomes that provide a global view of biological systems with higher descriptive capability, as compared to single omics. In this review, we have discussed various computational approaches implemented to infer and analyze two of the most important and well studied interactomes: protein-protein interaction networks and gene co-expression networks. We have explicitly focused on recent methods and pipelines implemented to infer and extract biologically important information from these interactomes, starting from utilizing single-omics data and then progressing towards multi-omics data. Accordingly, recent examples and case studies are also briefly discussed. Overall, this review will provide a proper understanding of the latest developments in protein and gene network modelling and will also help in extracting practical knowledge from them.
Collapse
Affiliation(s)
- Vinay Randhawa
- Department of Biochemistry, Panjab University, Chandigarh, 160014, India
| | - Shivalika Pathania
- Department of Biotechnology, Panjab University, Chandigarh, 160014, India
| |
Collapse
|
20
|
Wong DCJ. Network aggregation improves gene function prediction of grapevine gene co-expression networks. PLANT MOLECULAR BIOLOGY 2020; 103:425-441. [PMID: 32266646 DOI: 10.1007/s11103-020-01001-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/21/2020] [Indexed: 05/08/2023]
Abstract
Aggregation across multiple networks highlights robust co-expression interactions and improves the functional connectivity of grapevine gene co-expression networks. In recent years, the rapid accumulation of transcriptome datasets from diverse experimental conditions has enabled the widespread use of gene co-expression network (GCN) analysis in plants. In grapevine, GCN analysis has shown great promise for gene function prediction, however, measurable progress is currently lacking. Using accumulated microarray datasets from the grapevine whole-genome array (33 experiments, 1359 samples), we explored how meta-analysis through aggregation influences the functional connectivity (performance) of derived networks using guilt-by-association neighbor voting. Two annotation schemes, i.e. MapMan BIN and Pfam, at two sparsity thresholds, i.e. top 100 (stringent) and 300 (relaxed) ranked genes were evaluated. We observed that aggregating across multiple networks improves performance dramatically, with the aggregate outperforming the majority of functional terms across individual networks. Network sparsity and size (i.e. the number of samples and aggregates) were key factors influencing performance while the choice of annotation scheme had little. Systematic comparison with various state-of-the-art microarray and RNA-seq networks was also performed, however, none outperformed the aggregate microarray network despite having good predictive performance. Repeating these series of tests using a functional enrichment-based performance metric also showed remarkably consistent findings with guilt-by-association neighbor voting. To demonstrate its functionality, we explore the function and transcriptional regulation of grapevine EXPANSIN genes. We envisage that network aggregation will offer new and unique opportunities for gene function prediction in future grapevine functional genomics studies. To this end, we make the aggregate networks and associated metadata publicly available at VTC-Agg (https://sites.google.com/view/vtc-agg).
Collapse
Affiliation(s)
- Darren C J Wong
- Ecology and Evolution, Research School of Biology, The Australian National University, Acton, ACT, 2601, Australia.
| |
Collapse
|