1
|
Ahsan N, Kataya ARA, Rao RSP, Swatek KN, Wilson RS, Meyer LJ, Tovar-Mendez A, Stevenson S, Maszkowska J, Dobrowolska G, Yao Q, Xu D, Thelen JJ. Decoding Arabidopsis thaliana CPK/SnRK Superfamily Kinase Client Signaling Networks Using Peptide Library and Mass Spectrometry. PLANTS (BASEL, SWITZERLAND) 2024; 13:1481. [PMID: 38891291 PMCID: PMC11174488 DOI: 10.3390/plants13111481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/08/2024] [Accepted: 05/17/2024] [Indexed: 06/21/2024]
Abstract
Members of the calcium-dependent protein kinase (CDPK/CPK) and SNF-related protein kinase (SnRK) superfamilies are commonly found in plants and some protists. Our knowledge of client specificity of the members of this superfamily is fragmentary. As this family is represented by over 30 members in Arabidopsis thaliana, the identification of kinase-specific and overlapping client relationships is crucial to our understanding the nuances of this large family of kinases as directed towards signal transduction pathways. Herein, we used the kinase client (KiC) assay-a relative, quantitative, high-throughput mass spectrometry-based in vitro phosphorylation assay-to identify and characterize potential CPK/SnRK targets of Arabidopsis. Eight CPKs (1, 3, 6, 8, 17, 24, 28, and 32), four SnRKs (subclass 1 and 2), and PPCK1 and PPCK2 were screened against a synthetic peptide library that contains 2095 peptides and 2661 known phosphorylation sites. A total of 625 in vitro phosphorylation sites corresponding to 203 non-redundant proteins were identified. The most promiscuous kinase, CPK17, had 105 candidate target proteins, many of which had already been discovered. Sequence analysis of the identified phosphopeptides revealed four motifs: LxRxxS, RxxSxxR, RxxS, and LxxxxS, that were significantly enriched among CPK/SnRK clients. The results provide insight into both CPK- and SnRK-specific and overlapping signaling network architectures and recapitulate many known in vivo relationships validating this large-scale approach towards discovering kinase targets.
Collapse
Affiliation(s)
- Nagib Ahsan
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Department of Chemistry and Biochemistry, Mass Spectrometry, Proteomics and Metabolomics Core Facility, Stephenson Life Sciences Research Center, The University of Oklahoma, Norman, OK 73019, USA
| | - Amr R. A. Kataya
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - R. Shyama Prasad Rao
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Center for Bioinformatics, NITTE Deemed to be University, Mangaluru 575018, India
| | - Kirby N. Swatek
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Medical Research Council Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dundee DD1 5EH, UK
| | - Rashaun S. Wilson
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Arvinas, Inc., New Haven, CT 06511, USA
| | - Louis J. Meyer
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Bayer Crop Science, St. Louis, MO 63141, USA
| | - Alejandro Tovar-Mendez
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Elemental Enzymes, St. Louis, MO 63132, USA
| | - Severin Stevenson
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Justyna Maszkowska
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul. Pawińskiego 5a, 02-106 Warsaw, Poland (G.D.)
| | - Grazyna Dobrowolska
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul. Pawińskiego 5a, 02-106 Warsaw, Poland (G.D.)
| | - Qiuming Yao
- Department of Electrical Engineering & Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering & Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Jay J. Thelen
- Division of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
2
|
Wang J, Wan YW, Al-Ouran R, Huang M, Liu Z. CoRegNet: unraveling gene co-regulation networks from public RNA-Seq repositories using a beta-binomial statistical model. Brief Bioinform 2023; 25:bbad380. [PMID: 38113079 PMCID: PMC10729864 DOI: 10.1093/bib/bbad380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/13/2023] [Indexed: 12/21/2023] Open
Abstract
Millions of RNA sequencing samples have been deposited into public databases, providing a rich resource for biological research. These datasets encompass tens of thousands of experiments and offer comprehensive insights into human cellular regulation. However, a major challenge is how to integrate these experiments that acquired at different conditions. We propose a new statistical tool based on beta-binomial distributions that can construct robust gene co-regulation network (CoRegNet) across tens of thousands of experiments. Our analysis of over 12 000 experiments involving human tissues and cells shows that CoRegNet significantly outperforms existing gene co-expression-based methods. Although the majority of the genes are linearly co-regulated, we did discover an interesting set of genes that are non-linearly co-regulated; half of the time they change in the same direction and the other half they change in the opposite direction. Additionally, we identified a set of gene pairs that follows the Simpson's paradox. By utilizing public domain data, CoRegNet offers a powerful approach for identifying functionally related gene pairs, thereby revealing new biological insights.
Collapse
Affiliation(s)
- Jiasheng Wang
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX 77030, USA
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ying-Wooi Wan
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Howard Hughes Medical Institute, Houston, TX 77030, USA
| | | | - Meichen Huang
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX 77030, USA
- Department of Neurology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhandong Liu
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX 77030, USA
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
3
|
Computational principles and challenges in single-cell data integration. Nat Biotechnol 2021; 39:1202-1215. [PMID: 33941931 DOI: 10.1038/s41587-021-00895-7] [Citation(s) in RCA: 158] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/16/2021] [Indexed: 02/07/2023]
Abstract
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term 'data integration' has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.
Collapse
|
4
|
Rothenhäusler D, Meinshausen N, Bühlmann P, Peters J. Anchor regression: Heterogeneous data meet causality. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12398] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | | | | | - Jonas Peters
- Department of Mathematical Sciences University of Copenhagen Copenhagen Denmark
| |
Collapse
|
5
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
6
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
7
|
Slieker RC, van der Heijden AAWA, van Leeuwen N, Mei H, Nijpels G, Beulens JWJ, 't Hart LM. HbA 1c is associated with altered expression in blood of cell cycle- and immune response-related genes. Diabetologia 2018; 61:138-146. [PMID: 29159468 PMCID: PMC6448931 DOI: 10.1007/s00125-017-4467-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 09/01/2017] [Indexed: 12/22/2022]
Abstract
AIMS/HYPOTHESIS Individuals with type 2 diabetes are heterogeneous in their glycaemic control as tracked by blood HbA1c levels. Here, we investigated the extent to which gene expression levels in blood reflect current and future HbA1c levels. METHODS HbA1c levels at baseline and 1 and 2 year follow-up were compared with gene expression levels in 391 individuals with type 2 diabetes from the Hoorn Diabetes Care System Cohort (15,564 genes, RNA sequencing). The functions of associated baseline genes were investigated further using pathway enrichment analysis. Using publicly available data, we investigated whether the genes identified are also associated with HbA1c in the target tissues, muscle and pancreas. RESULTS At baseline, 220 genes (1.4%) were associated with baseline HbA1c. Identified genes were enriched for cell cycle and complement system activation pathways. The association of 15 genes extended to the target tissues, muscle (n = 113) and pancreatic islets (n = 115). At follow-up, expression of 25 genes (0.16%) associated with 1 year HbA1c and nine genes (0.06%) with 2 year HbA1c. Five genes overlapped across all time points, and 18 additional genes between baseline and 1 year follow-up. After adjustment for baseline HbA1c, the number of significant genes at 1 and 2 years markedly decreased, suggesting that gene expression levels in whole blood reflect the current glycaemic state and but not necessarily the future glycaemic state. CONCLUSIONS/INTERPRETATION HbA1c levels in individuals with type 2 diabetes are associated with expression levels of genes that link to the cell cycle and complement system activation.
Collapse
Affiliation(s)
- Roderick C Slieker
- Department of Molecular Cell Biology, Leiden University Medical Center, Postal Box 9600, 2300 RC, Leiden, the Netherlands
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, the Netherlands
| | - Amber A W A van der Heijden
- Department of General Practice and Elderly Care Medicine, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, the Netherlands
| | - Nienke van Leeuwen
- Department of Molecular Cell Biology, Leiden University Medical Center, Postal Box 9600, 2300 RC, Leiden, the Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Giel Nijpels
- Department of General Practice and Elderly Care Medicine, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, the Netherlands
| | - Joline W J Beulens
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, the Netherlands
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Leen M 't Hart
- Department of Molecular Cell Biology, Leiden University Medical Center, Postal Box 9600, 2300 RC, Leiden, the Netherlands.
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, the Netherlands.
- Molecular Epidemiology Section, Leiden University Medical Center, Leiden, the Netherlands.
| |
Collapse
|
8
|
Padayachee T, Khamiakova T, Shkedy Z, Perola M, Salo P, Burzykowski T. The Detection of Metabolite-Mediated Gene Module Co-Expression Using Multivariate Linear Models. PLoS One 2016; 11:e0150257. [PMID: 26918614 PMCID: PMC4769021 DOI: 10.1371/journal.pone.0150257] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 02/11/2016] [Indexed: 12/29/2022] Open
Abstract
Investigating whether metabolites regulate the co-expression of a predefined gene module is one of the relevant questions posed in the integrative analysis of metabolomic and transcriptomic data. This article concerns the integrative analysis of the two high-dimensional datasets by means of multivariate models and statistical tests for the dependence between metabolites and the co-expression of a gene module. The general linear model (GLM) for correlated data that we propose models the dependence between adjusted gene expression values through a block-diagonal variance-covariance structure formed by metabolic-subset specific general variance-covariance blocks. Performance of statistical tests for the inference of conditional co-expression are evaluated through a simulation study. The proposed methodology is applied to the gene expression data of the previously characterized lipid-leukocyte module. Our results show that the GLM approach improves on a previous approach by being less prone to the detection of spurious conditional co-expression.
Collapse
Affiliation(s)
- Trishanta Padayachee
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat), Hasselt University, Diepenbeek, Belgium
- * E-mail:
| | - Tatsiana Khamiakova
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat), Hasselt University, Diepenbeek, Belgium
| | - Ziv Shkedy
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat), Hasselt University, Diepenbeek, Belgium
| | - Markus Perola
- Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland
| | - Perttu Salo
- Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Finland
| | - Tomasz Burzykowski
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat), Hasselt University, Diepenbeek, Belgium
| |
Collapse
|
9
|
Abstract
Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent simple but effective structures for discovering and interpreting linear gene relationships. In recent years, several approaches have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is most crucial when the number of samples is small, yielding a non-negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The threshold is theoretically derived by means of an analytic approach and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data. We show, on synthetic and array datasets, that the proposed threshold is effective in eliminating all false positive links, with an offsetting cost in terms of false negative detected edges.
Collapse
|
10
|
Mostafavi S, Battle A, Zhu X, Potash JB, Weissman MM, Shi J, Beckman K, Haudenschild C, McCormick C, Mei R, Gameroff MJ, Gindes H, Adams P, Goes FS, Mondimore FM, MacKinnon DF, Notes L, Schweizer B, Furman D, Montgomery SB, Urban AE, Koller D, Levinson DF. Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing. Mol Psychiatry 2014; 19:1267-74. [PMID: 24296977 PMCID: PMC5404932 DOI: 10.1038/mp.2013.161] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 08/27/2013] [Accepted: 09/24/2013] [Indexed: 01/23/2023]
Abstract
A study of genome-wide gene expression in major depressive disorder (MDD) was undertaken in a large population-based sample to determine whether altered expression levels of genes and pathways could provide insights into biological mechanisms that are relevant to this disorder. Gene expression studies have the potential to detect changes that may be because of differences in common or rare genomic sequence variation, environmental factors or their interaction. We recruited a European ancestry sample of 463 individuals with recurrent MDD and 459 controls, obtained self-report and semi-structured interview data about psychiatric and medical history and other environmental variables, sequenced RNA from whole blood and genotyped a genome-wide panel of common single-nucleotide polymorphisms. We used analytical methods to identify MDD-related genes and pathways using all of these sources of information. In analyses of association between MDD and expression levels of 13 857 single autosomal genes, accounting for multiple technical, physiological and environmental covariates, a significant excess of low P-values was observed, but there was no significant single-gene association after genome-wide correction. Pathway-based analyses of expression data detected significant association of MDD with increased expression of genes in the interferon α/β signaling pathway. This finding could not be explained by potentially confounding diseases and medications (including antidepressants) or by computationally estimated proportions of white blood cell types. Although cause-effect relationships cannot be determined from these data, the results support the hypothesis that altered immune signaling has a role in the pathogenesis, manifestation, and/or the persistence and progression of MDD.
Collapse
Affiliation(s)
- S Mostafavi
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - A Battle
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - X Zhu
- Department of Psychiatry and Behavioral Science, Stanford University School of Medicine, Stanford, CA, USA
| | - J B Potash
- Department of Psychiatry, University of Iowa Hospitals & Clinics, Iowa City, IA, USA
| | - M M Weissman
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, NY, USA
| | - J Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - K Beckman
- Biomedical Genomics Center, University of Minnesota, Minneapolis, MN, USA
| | | | | | - R Mei
- Centrillion Biosciences, Inc., Palo Alto, CA, USA
| | - M J Gameroff
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, NY, USA
| | - H Gindes
- Department of Psychiatry, University of Iowa Hospitals & Clinics, Iowa City, IA, USA
| | - P Adams
- Department of Psychiatry, Columbia University and New York State Psychiatric Institute, New York, NY, USA
| | - F S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - F M Mondimore
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - D F MacKinnon
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - L Notes
- Depatment of Clinical Psychology, American University, Washington DC, DC, USA
| | - B Schweizer
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - D Furman
- Department of Microbiology & Immunology, School of Medicine, Stanford University, Stanford, CA, USA
| | - S B Montgomery
- 1] Department of Genetics, Stanford University, Stanford, CA, USA [2] Department of Pathology, Stanford University, Stanford, CA, USA
| | - A E Urban
- Department of Psychiatry and Behavioral Science, Stanford University School of Medicine, Stanford, CA, USA
| | - D Koller
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - D F Levinson
- Department of Psychiatry and Behavioral Science, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
11
|
Ling MHT, Poh CL. A predictor for predicting Escherichia coli transcriptome and the effects of gene perturbations. BMC Bioinformatics 2014; 15:140. [PMID: 24884349 PMCID: PMC4038595 DOI: 10.1186/1471-2105-15-140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Accepted: 05/09/2014] [Indexed: 11/24/2022] Open
Abstract
Background A means to predict the effects of gene over-expression, knockouts, and environmental stimuli in silico is useful for system biologists to develop and test hypotheses. Several studies had predicted the expression of all Escherichia coli genes from sequences and reported a correlation of 0.301 between predicted and actual expression. However, these do not allow biologists to study the effects of gene perturbations on the native transcriptome. Results We developed a predictor to predict transcriptome-scale gene expression from a small number (n = 59) of known gene expressions using gene co-expression network, which can be used to predict the effects of over-expressions and knockdowns on E. coli transcriptome. In terms of transcriptome prediction, our results show that the correlation between predicted and actual expression value is 0.467, which is similar to the microarray intra-array variation (p-value = 0.348), suggesting that intra-array variation accounts for a substantial portion of the transcriptome prediction error. In terms of predicting the effects of gene perturbation(s), our results suggest that the expression of 83% of the genes affected by perturbation can be predicted within 40% of error and the correlation between predicted and actual expression values among the affected genes to be 0.698. With the ability to predict the effects of gene perturbations, we demonstrated that our predictor has the potential to estimate the effects of varying gene expression level on the native transcriptome. Conclusion We present a potential means to predict an entire transcriptome and a tool to estimate the effects of gene perturbations for E. coli, which will aid biologists in hypothesis development. This study forms the baseline for future work in using gene co-expression network for gene expression prediction.
Collapse
Affiliation(s)
- Maurice H T Ling
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Nanyang Ave, Singapore, Singapore.
| | | |
Collapse
|
12
|
Discrimination of the expression of paralogous microRNA precursors that share the same major mature form. PLoS One 2014; 9:e90591. [PMID: 24594692 PMCID: PMC3940925 DOI: 10.1371/journal.pone.0090591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 02/02/2014] [Indexed: 12/21/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a class of small non-coding RNAs generated from endogenous transcripts that form hairpin structures. The hairpin precursor is processed into two mature miRNAs that form major/minor duplexes. Mature miRNAs regulate gene expression by cleaving mRNA or repressing protein translation. Numerous miRNAs have been discovered via deep sequencing. Many miRNAs are produced from multiple genome sites. These miRNAs are grouped into paralogous families of miRNAs that generate the same major mature form within organisms. Currently, no method of distinguishing the expression of these miRNAs is available. Results In the present study, strategies were developed to discriminate and quantify the expression of paralogous miRNA precursors. First, paralogous miRNA precursors that were differentially expressed in tissues were identified through analysis of the coexpression scores of their major and minor forms based on deep sequencing data. Then the precursors were identified by monitoring the expression of their host gene or minor form using real-time PCR. Finally, precursors were identified by assessing the expression of clusters of miRNA members. These approaches were used to distinguish miR-128-1 and miR-128-2 as well as miR-194-1 and miR-194-2. The mechanism of transcription related to the differential expression of miR-194-1 and miR-194-2 was also investigated. Conclusion This is the first report to distinguish paralogous miRNA copies by analyzing the expression of major-minor pairs, the host gene, and miRNA clusters. Discriminating paralogous precursors can provide useful information for investigating the mechanisms that regulate miRNA gene expression under different physiological and pathological conditions.
Collapse
|
13
|
Mostafavi S, Battle A, Zhu X, Urban AE, Levinson D, Montgomery SB, Koller D. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PLoS One 2013; 8:e68141. [PMID: 23874524 PMCID: PMC3715474 DOI: 10.1371/journal.pone.0068141] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 05/25/2013] [Indexed: 11/19/2022] Open
Abstract
Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.
Collapse
Affiliation(s)
- Sara Mostafavi
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Alexis Battle
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Xiaowei Zhu
- Department of Psychiatry & Behavioral Science, Stanford University, Stanford, California, United States of America
| | - Alexander E. Urban
- Department of Psychiatry & Behavioral Science, Stanford University, Stanford, California, United States of America
| | - Douglas Levinson
- Department of Psychiatry & Behavioral Science, Stanford University, Stanford, California, United States of America
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University, Stanford, California, United States of America
| | - Daphne Koller
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| |
Collapse
|
14
|
Mooney M, Bond J, Monks N, Eugster E, Cherba D, Berlinski P, Kamerling S, Marotti K, Simpson H, Rusk T, Tembe W, Legendre C, Benson H, Liang W, Webb CP. Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris. PLoS One 2013; 8:e61088. [PMID: 23593398 PMCID: PMC3617154 DOI: 10.1371/journal.pone.0061088] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 03/05/2013] [Indexed: 11/19/2022] Open
Abstract
Comparative oncology is a developing research discipline that is being used to assist our understanding of human neoplastic diseases. Companion canines are a preferred animal oncology model due to spontaneous tumor development and similarity to human disease at the pathophysiological level. We use a paired RNA sequencing (RNA-Seq)/microarray analysis of a set of four normal canine lymph nodes and ten canine lymphoma fine needle aspirates to identify technical biases and variation between the technologies and convergence on biological disease pathways. Surrogate Variable Analysis (SVA) provides a formal multivariate analysis of the combined RNA-Seq/microarray data set. Applying SVA to the data allows us to decompose variation into contributions associated with transcript abundance, differences between the technology, and latent variation within each technology. A substantial and highly statistically significant component of the variation reflects transcript abundance, and RNA-Seq appeared more sensitive for detection of transcripts expressed at low levels. Latent random variation among RNA-Seq samples is also distinct in character from that impacting microarray samples. In particular, we observed variation between RNA-Seq samples that reflects transcript GC content. Platform-independent variable decomposition without a priori knowledge of the sources of variation using SVA represents a generalizable method for accomplishing cross-platform data analysis. We identified genes differentially expressed between normal lymph nodes of disease free dogs and a subset of the diseased dogs diagnosed with B-cell lymphoma using each technology. There is statistically significant overlap between the RNA-Seq and microarray sets of differentially expressed genes. Analysis of overlapping genes in the context of biological systems suggests elevated expression and activity of PI3K signaling in B-cell lymphoma biopsies compared with normal biopsies, consistent with literature describing successful use of drugs targeting this pathway in lymphomas.
Collapse
Affiliation(s)
- Marie Mooney
- Laboratory of Translational Medicine, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
| | - Jeffrey Bond
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Noel Monks
- Laboratory of Translational Medicine, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
| | - Emily Eugster
- Laboratory of Translational Medicine, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
| | - David Cherba
- Laboratory of Translational Medicine, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
| | - Pamela Berlinski
- Pfizer Animal Health, Pfizer Inc, Kalamazoo, Michigan, United States of America
| | - Steve Kamerling
- Pfizer Animal Health, Pfizer Inc, Kalamazoo, Michigan, United States of America
| | - Keith Marotti
- Pfizer Animal Health, Pfizer Inc, Kalamazoo, Michigan, United States of America
| | - Heather Simpson
- Pfizer Animal Health, Pfizer Inc, Kalamazoo, Michigan, United States of America
| | - Tony Rusk
- Animal Clinical Investigation, Washington, D.C., United States of America
| | - Waibhav Tembe
- Collaborative Bioinformatics Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Christophe Legendre
- Collaborative Bioinformatics Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Hollie Benson
- Collaborative Sequencing Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Winnie Liang
- Collaborative Sequencing Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Craig Paul Webb
- Laboratory of Translational Medicine, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
- * E-mail:
| |
Collapse
|
15
|
Sun X, Casbas-Hernandez P, Bigelow C, Makowski L, Joseph Jerry D, Smith Schneider S, Troester MA. Normal breast tissue of obese women is enriched for macrophage markers and macrophage-associated gene expression. Breast Cancer Res Treat 2011; 131:1003-12. [PMID: 22002519 DOI: 10.1007/s10549-011-1789-3] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 09/15/2011] [Indexed: 12/24/2022]
Abstract
Activation of inflammatory pathways is one plausible mechanism underlying the association between obesity and increased breast cancer risk. However, macrophage infiltration and local biomarkers of inflammation in breast adipose tissue have seldom been studied in association with obesity. Gene expression profiles of normal breast tissue from reduction mammoplasty patients were evaluated by whole genome microarrays to identify patterns associated with obesity status (normal-weight, body mass index (BMI) <25; overweight, BMI 25-29.9; obese, BMI ≥30). The presence of macrophage-enriched inflammatory loci with immunopositivity for CD68 protein was evaluated by immunohistochemistry (IHC). After adjusting for confounding by age, 760 genes were differentially expressed (203 up and 557 down; FDR = 0.026) between normal-weight and obese women. Gene ontology analysis suggested significant enrichment for pathways involving IL-6, IL-8, CCR5 signaling in macrophages and RXRα and PPARα activation, consistent with a pro-inflammatory state and suggestive of macrophage infiltration. Gene set enrichment analysis also demonstrated that the genomic signatures of monocytes and macrophages were over-represented in the obese group with FDR of 0.08 and 0.13, respectively. Increased macrophage infiltration was confirmed by IHC, which showed that the breast adipose tissue of obese women had higher average macrophage counts (mean = 8.96 vs. 3.56 in normal-weight women) and inflammatory foci counts (mean = 4.91 vs. 2.67 in normal-weight women). Obesity is associated with local inflammation and macrophage infiltration in normal human breast adipose tissues. Given the role of macrophages in carcinogenesis, these findings have important implications for breast cancer etiology and progression.
Collapse
Affiliation(s)
- Xuezheng Sun
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | |
Collapse
|