1
|
Duan M, Liu Y, Zhao D, Li H, Zhang G, Liu H, Wang Y, Fan Y, Huang L, Zhou F. Gender-specific dysregulations of nondifferentially expressed biomarkers of metastatic colon cancer. Comput Biol Chem 2023; 104:107858. [PMID: 37058814 DOI: 10.1016/j.compbiolchem.2023.107858] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/12/2023] [Accepted: 03/29/2023] [Indexed: 04/16/2023]
Abstract
Colon cancer is a common cancer type in both sexes and its mortality rate increases at the metastatic stage. Most studies exclude nondifferentially expressed genes from biomarker analysis of metastatic colon cancers. The motivation of this study is to find the latent associations of the nondifferentially expressed genes with metastatic colon cancers and to evaluate the gender specificity of such associations. This study formulates the expression level prediction of a gene as a regression model trained for primary colon cancers. The difference between a gene's predicted and original expression levels in a testing sample is defined as its mqTrans value (model-based quantitative measure of transcription regulation), which quantitatively measures the change of the gene's transcription regulation in this testing sample. We use the mqTrans analysis to detect the messenger RNA (mRNA) genes with nondifferential expression on their original expression levels but differentially expressed mqTrans values between primary and metastatic colon cancers. These genes are referred to as dark biomarkers of metastatic colon cancer. All dark biomarker genes were verified by two transcriptome profiling technologies, RNA-seq and microarray. The mqTrans analysis of a mixed cohort of both sexes could not recover gender-specific dark biomarkers. Most dark biomarkers overlap with long non-coding RNAs (lncRNAs), and these lncRNAs might have contributed their transcripts to calculating the dark biomarkers' expression levels. Therefore, mqTrans analysis serves as a complementary approach to identify dark biomarkers generally ignored by conventional studies, and it is essential to separate the female and male samples into two analysis experiments. The dataset and mqTrans analysis code are available at https://figshare.com/articles/dataset/22250536.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China; School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yaqing Liu
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Dong Zhao
- School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China
| | - Haijun Li
- School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China
| | - Gongyou Zhang
- School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China
| | - Hongmei Liu
- School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China; Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang 550025, Guizhou, China
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yusi Fan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China; College of Software, Jilin University, Changchun, Jilin 130012, China.
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China; School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, Guizhou, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.
| |
Collapse
|
2
|
Roberts AGK, Catchpoole DR, Kennedy PJ. Identification of differentially distributed gene expression and distinct sets of cancer-related genes identified by changes in mean and variability. NAR Genom Bioinform 2022; 4:lqab124. [PMID: 35047816 PMCID: PMC8759562 DOI: 10.1093/nargab/lqab124] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/19/2021] [Accepted: 12/16/2021] [Indexed: 12/13/2022] Open
Abstract
There is increasing evidence that changes in the variability or overall distribution of gene expression are important both in normal biology and in diseases, particularly cancer. Genes whose expression differs in variability or distribution without a difference in mean are ignored by traditional differential expression-based analyses. Using a Bayesian hierarchical model that provides tests for both differential variability and differential distribution for bulk RNA-seq data, we report here an investigation into differential variability and distribution in cancer. Analysis of eight paired tumour-normal datasets from The Cancer Genome Atlas confirms that differential variability and distribution analyses are able to identify cancer-related genes. We further demonstrate that differential variability identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential variability identify functionally distinct sets of potentially cancer-related genes. These results suggest that differential variability analysis may provide insights into genetic aspects of cancer that would not be revealed by differential expression, and that differential distribution analysis may allow for more comprehensive identification of cancer-related genes than analyses based on changes in mean or variability alone.
Collapse
|
3
|
Liany H, Rajapakse JC, Karuturi RKM. MultiDCoX: Multi-factor analysis of differential co-expression. BMC Bioinformatics 2017; 18:576. [PMID: 29297310 PMCID: PMC5751780 DOI: 10.1186/s12859-017-1963-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Background Differential co-expression (DCX) signifies change in degree of co-expression of a set of genes among different biological conditions. It has been used to identify differential co-expression networks or interactomes. Many algorithms have been developed for single-factor differential co-expression analysis and applied in a variety of studies. However, in many studies, the samples are characterized by multiple factors such as genetic markers, clinical variables and treatments. No algorithm or methodology is available for multi-factor analysis of differential co-expression. Results We developed a novel formulation and a computationally efficient greedy search algorithm called MultiDCoX to perform multi-factor differential co-expression analysis. Simulated data analysis demonstrates that the algorithm can effectively elicit differentially co-expressed (DCX) gene sets and quantify the influence of each factor on co-expression. MultiDCoX analysis of a breast cancer dataset identified interesting biologically meaningful differentially co-expressed (DCX) gene sets along with genetic and clinical factors that influenced the respective differential co-expression. Conclusions MultiDCoX is a space and time efficient procedure to identify differentially co-expressed gene sets and successfully identify influence of individual factors on differential co-expression. Electronic supplementary material The online version of this article (10.1186/s12859-017-1963-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Herty Liany
- School of Computing, National University of Singapore, 21 Lower Kent Ridge Rd, Singapore, 119077, Singapore.,Computational and System Biology, Genome Institute of Singapore, A-STAR, 60 Biopolis Street, Singapore, 138672, Singapore
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798, Singapore
| | - R Krishna Murthy Karuturi
- Computational and System Biology, Genome Institute of Singapore, A-STAR, 60 Biopolis Street, Singapore, 138672, Singapore. .,The Jackson Laboratory, 10 Discovery Dr, Farmington, CT, 06032, USA.
| |
Collapse
|
4
|
Module Based Differential Coexpression Analysis Method for Type 2 Diabetes. BIOMED RESEARCH INTERNATIONAL 2015; 2015:836929. [PMID: 26339648 PMCID: PMC4538423 DOI: 10.1155/2015/836929] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 12/29/2014] [Indexed: 11/24/2022]
Abstract
More and more studies have shown that many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional biological pathway or network and are highly correlated. Differential coexpression analysis, as a more comprehensive technique to the differential expression analysis, was raised to research gene regulatory networks and biological pathways of phenotypic changes through measuring gene correlation changes between disease and normal conditions. In this paper, we propose a gene differential coexpression analysis algorithm in the level of gene sets and apply the algorithm to a publicly available type 2 diabetes (T2D) expression dataset. Firstly, we calculate coexpression biweight midcorrelation coefficients between all gene pairs. Then, we select informative correlation pairs using the “differential coexpression threshold” strategy. Finally, we identify the differential coexpression gene modules using maximum clique concept and k-clique algorithm. We apply the proposed differential coexpression analysis method on simulated data and T2D data. Two differential coexpression gene modules about T2D were detected, which should be useful for exploring the biological function of the related genes.
Collapse
|
5
|
Hejblum BP, Skinner J, Thiébaut R. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Comput Biol 2015; 11:e1004310. [PMID: 26111374 PMCID: PMC4482329 DOI: 10.1371/journal.pcbi.1004310] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 04/30/2015] [Indexed: 01/13/2023] Open
Abstract
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. Gene set analysis methods use prior biological knowledge to analyze gene expression data. This prior knowledge takes the form of predefined groups of genes, linked through their biological function. Gene set analysis methods have been successfully applied in transversal studies, their results being more sensitive and interpretable than those of methods investigating genomic data one gene at a time. The time-course gene set analysis (TcGSA) introduced here is an extension of such gene set analysis to longitudinal data. This method identifies a priori defined groups of genes whose expression is not stable over time, taking into account the potential heterogeneity between patients and between genes. When biological conditions are compared, it identifies the gene sets that have different expression dynamics according to these conditions. Data from 2 studies are analyzed: data from an HIV therapeutic vaccine trial, and data from a recent study on influenza and pneumococcal vaccines. In both cases, TcGSA provided new insights compared to standard approaches thanks to an increased sensitivity compared to other approaches. Those results highlight the benefits of the TcGSA method for analyzing gene expression dynamics.
Collapse
Affiliation(s)
- Boris P. Hejblum
- Univ. Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INSERM, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INRIA, Team SISTM, F-33000 Bordeaux, France
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
| | - Jason Skinner
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
| | - Rodolphe Thiébaut
- Univ. Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INSERM, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INRIA, Team SISTM, F-33000 Bordeaux, France
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
6
|
Hernández S, Franco L, Calvo A, Ferragut G, Hermoso A, Amela I, Gómez A, Querol E, Cedano J. Bioinformatics and Moonlighting Proteins. Front Bioeng Biotechnol 2015; 3:90. [PMID: 26157797 PMCID: PMC4478894 DOI: 10.3389/fbioe.2015.00090] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 06/10/2015] [Indexed: 01/25/2023] Open
Abstract
Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein–protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations – it requires the existence of multialigned family protein sequences – but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses.
Collapse
Affiliation(s)
- Sergio Hernández
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Luís Franco
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Alejandra Calvo
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| | - Gabriela Ferragut
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| | - Antoni Hermoso
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Isaac Amela
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Antonio Gómez
- Cancer Epigenetics and Biology Program, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet de Llobregat , Barcelona , Spain
| | - Enrique Querol
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Juan Cedano
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| |
Collapse
|
7
|
Kayano M, Shiga M, Mamitsuka H. Detecting Differentially Coexpressed Genes from Labeled Expression Data: A Brief Review. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:154-167. [PMID: 26355515 DOI: 10.1109/tcbb.2013.2297921] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We review methods for capturing differential coexpression, which can be divided into two cases by the size of gene sets: 1) two paired genes and 2) multiple genes. In the first case, two genes are positively and negatively correlated with each other under one and the other conditions, respectively. In the second case, multiple genes are coexpressed and randomly expressed under one and the other conditions, respectively. We summarize a variety of methods for the first and second cases into four and three approaches, respectively. We describe each of these approaches in detail technically, being followed by thorough comparative experiments with both synthetic and real data sets. Our experimental results imply high possibility of improving the efficiency of the current methods, particularly in the case of multiple genes, because of low performance achieved by the best methods which are relatively simple intuitive ones.
Collapse
|
8
|
Petri T, Küfner R, Zimmer R. Experiment specific expression patterns. J Comput Biol 2011; 18:1423-35. [PMID: 21919744 DOI: 10.1089/cmb.2011.0159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The differential analysis of genes between microarrays from several experimental conditions or treatments routinely estimates which genes change significantly between groups. As genes are never regulated individually, observed behavior may be a consequence of changes in other genes. Existing approaches like co-expression analysis aim to resolve such patterns from a wide range of experiments. The knowledge of such a background set of experiments can be used to compute expected gene behavior based on known links. It is particularly interesting to detect previously unseen specific effects in other experiments. Here, a new method to spot genes deviating from expected behavior (PAttern DEviation SCOring--Padesco) is devised. It uses linear regression models learned from a background set to arrive at gene specific prediction accuracy distributions. For a given experiment, it is then decided whether each gene is predicted better or worse than expected. This provides a novel way to estimate the experiment specificity of each gene. We propose a validation procedure to estimate the detection of such specific candidates and show that these can be identified with an average accuracy of about 85%.
Collapse
Affiliation(s)
- Tobias Petri
- LMU Munich, Department of Informatics, Munich, Germany.
| | | | | |
Collapse
|
9
|
de la Fuente A. From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. Trends Genet 2010; 26:326-33. [PMID: 20570387 DOI: 10.1016/j.tig.2010.05.001] [Citation(s) in RCA: 339] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Revised: 04/28/2010] [Accepted: 05/03/2010] [Indexed: 01/09/2023]
Abstract
Understanding diseases requires identifying the differences between healthy and affected tissues. Gene expression data have revolutionized the study of diseases by making it possible to simultaneously consider thousands of genes. The identification of disease-associated genes requires studying the genes in the context of the regulatory systems they are involved in. A major goal is to identify specific regulatory networks that are dysfunctional in a given disease state. Although we still have not reached a stage where the elucidation of differential regulatory networks is commonly feasible, recent advances have described the first steps towards this goal - the identification of differential coexpression networks. This review describes the shift from differential gene expression to differential networking and outlines how this shift will affect the study of the genetic basis of disease.
Collapse
Affiliation(s)
- Alberto de la Fuente
- CRS4 Bioinformatica, Polaris Edificio 3, Località Piscina Manna, 09010 Pula (CA), Italy.
| |
Collapse
|
10
|
Zhang H, Song X, Wang H, Zhang X. MIClique: An algorithm to identify differentially coexpressed disease gene subset from microarray data. J Biomed Biotechnol 2010; 2009:642524. [PMID: 20169000 PMCID: PMC2822236 DOI: 10.1155/2009/642524] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 10/28/2009] [Indexed: 01/05/2023] Open
Abstract
Computational analysis of microarray data has provided an effective way to identify disease-related genes. Traditional disease gene selection methods from microarray data such as statistical test always focus on differentially expressed genes in different samples by individual gene prioritization. These traditional methods might miss differentially coexpressed (DCE) gene subsets because they ignore the interaction between genes. In this paper, MIClique algorithm is proposed to identify DEC gene subsets based on mutual information and clique analysis. Mutual information is used to measure the coexpression relationship between each pair of genes in two different kinds of samples. Clique analysis is a commonly used method in biological network, which generally represents biological module of similar function. By applying the MIClique algorithm to real gene expression data, some DEC gene subsets which correlated under one experimental condition but uncorrelated under another condition are detected from the graph of colon dataset and leukemia dataset.
Collapse
Affiliation(s)
- Huanping Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Huinan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Xiaobai Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| |
Collapse
|
11
|
Foley A. Cardiac lineage selection: integrating biological complexity into computational models. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2009; 1:334-347. [DOI: 10.1002/wsbm.43] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Ann Foley
- Greenberg Division of Cardiology, Weill Cornell Medical College, 1300 York Avenue, New York, NY 10065, USA
| |
Collapse
|
12
|
Mayburd AL. Expression variation: its relevance to emergence of chronic disease and to therapy. PLoS One 2009; 4:e5921. [PMID: 19526064 PMCID: PMC2692004 DOI: 10.1371/journal.pone.0005921] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/13/2009] [Indexed: 12/05/2022] Open
Abstract
Background Stochastic fluctuations in the protein turnover underlie the random emergence of neural precursor cells from initially homogenous cell population. If stochastic alteration of the levels in signal transduction networks is sufficient to spontaneously alter a phenotype, can it cause a sporadic chronic disease as well – including cancer? Methods Expression in >80 disease-free tissue environments was measured using Affymetrix microarray platform comprising 54675 probe-sets. Steps were taken to suppress the technical noise inherent to microarray experiment. Next, the integrated expression and expression variability data were aligned with the mechanistic data covering major human chronic diseases. Results Measured as class average, variability of expression of disease associated genes measured in health was higher than variability of random genes for all chronic pathologies. Anti-cancer FDA approved targets were displaying much higher variability as a class compared to random genes. Same held for magnitude of gene expression. The genes known to participate in multiple chronic disorders demonstrated the highest variability. Disease-related gene categories displayed on average more intricate regulation of biological function vs random reference, were enriched in adaptive and transient functions as well as positive feedback relationships. Conclusions A possible causative link can be suggested between normal (healthy) state gene expression variation and inception of major human pathologies, including cancer. Study of variability profiles may lead to novel diagnostic methods, therapies and better drug target prioritization. The results of the study suggest the need to advance personalized therapy development.
Collapse
|
13
|
Prieto C, Risueño A, Fontanillo C, De Las Rivas J. Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS One 2008; 3:e3911. [PMID: 19081792 PMCID: PMC2597745 DOI: 10.1371/journal.pone.0003911] [Citation(s) in RCA: 187] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Accepted: 11/05/2008] [Indexed: 12/12/2022] Open
Abstract
Background Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global “omic” scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. Methodology/Principal Findings Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families. Conclusions/Significance The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/.
Collapse
Affiliation(s)
- Carlos Prieto
- Bioinformatics and Functional Genomics Research Group, Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
| | - Alberto Risueño
- Bioinformatics and Functional Genomics Research Group, Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
| | - Celia Fontanillo
- Bioinformatics and Functional Genomics Research Group, Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
| | - Javier De Las Rivas
- Bioinformatics and Functional Genomics Research Group, Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
- * E-mail:
| |
Collapse
|
14
|
Ho JWK, Stefani M, dos Remedios CG, Charleston MA. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 2008; 24:i390-8. [PMID: 18586739 PMCID: PMC2718620 DOI: 10.1093/bioinformatics/btn142] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Current microarray analyses focus on identifying sets of genes that are differentially expressed (DE) or differentially coexpressed (DC) in different biological states (e.g. diseased versus non-diseased). We observed that in many human diseases, some genes have a significant increase or decrease in expression variability (variance). As these observed changes in expression variability may be caused by alteration of the underlying expression dynamics, such differential variability (DV) patterns are also biologically interesting. RESULTS Here we propose a novel analysis for changes in gene expression variability between groups of samples, which we call differential variability analysis. We introduce the concept of differential variability (DV), and present a simple procedure for identifying DV genes from microarray data. Our procedure is evaluated with simulated and real microarray datasets. The effect of data preprocessing methods on identification of DV gene is investigated. The biological significance of DV analysis is demonstrated with four human disease datasets. The relationships among DV, DE and DC genes are investigated. The results suggest that changes in expression variability are associated with changes in coexpression pattern, which imply that DV is not merely stochastic noise, but informative signal. AVAILABILITY The R source code for differential variability analysis is available from the contact authors upon request.
Collapse
Affiliation(s)
- Joshua W K Ho
- School of Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia.
| | | | | | | |
Collapse
|