1
|
Significance analysis for clustering with single-cell RNA-sequencing data. Nat Methods 2023; 20:1196-1202. [PMID: 37429993 DOI: 10.1038/s41592-023-01933-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 06/01/2023] [Indexed: 07/12/2023]
Abstract
Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.
Collapse
|
2
|
Social determinants of health derived from people with opioid use disorder: Improving data collection, integration and use with cross-domain collaboration and reproducible, data-centric, notebook-style workflows. Front Med (Lausanne) 2023; 10:1076794. [PMID: 36936205 PMCID: PMC10017859 DOI: 10.3389/fmed.2023.1076794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 01/30/2023] [Indexed: 03/06/2023] Open
Abstract
Deriving social determinants of health from underserved populations is an important step in the process of improving the well-being of these populations and in driving policy improvements to facilitate positive change in health outcomes. Collection, integration, and effective use of clinical data for this purpose presents a variety of specific challenges. We assert that combining expertise from three distinct domains, specifically, medical, statistical, and computer and data science can be applied along with provenance-aware, self-documenting workflow tools. This combination permits data integration and facilitates the creation of reproducible workflows and usable (reproducible) results from the sensitive and disparate sources of clinical data that exist for underserved populations.
Collapse
|
3
|
Genome-wide DNA methylation patterns reveal clinically relevant predictive and prognostic subtypes in human osteosarcoma. Commun Biol 2022; 5:213. [PMID: 35260776 PMCID: PMC8904843 DOI: 10.1038/s42003-022-03117-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 01/24/2022] [Indexed: 12/14/2022] Open
Abstract
Aberrant methylation of genomic DNA has been reported in many cancers. Specific DNA methylation patterns have been shown to provide clinically useful prognostic information and define molecular disease subtypes with different response to therapy and long-term outcome. Osteosarcoma is an aggressive malignancy for which approximately half of tumors recur following standard combined surgical resection and chemotherapy. No accepted prognostic factor save tumor necrosis in response to adjuvant therapy currently exists, and traditional genomic studies have thus far failed to identify meaningful clinical associations. We studied the genome-wide methylation state of primary tumors and tested how they predict patient outcomes. We discovered relative genomic hypomethylation to be strongly predictive of response to standard chemotherapy. Recurrence and survival were also associated with genomic methylation, but through more site-specific patterns. Furthermore, the methylation patterns were reproducible in three small independent clinical datasets. Downstream transcriptional, in vitro, and pharmacogenomic analysis provides insight into the clinical translation of the methylation patterns. Our findings suggest the assessment of genomic methylation may represent a strategy for stratifying patients for the application of alternative therapies.
Collapse
|
4
|
Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 2021; 184:1836-1857.e22. [PMID: 33713619 PMCID: PMC7874909 DOI: 10.1016/j.cell.2021.02.018] [Citation(s) in RCA: 133] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 12/16/2020] [Accepted: 02/05/2021] [Indexed: 02/06/2023]
Abstract
COVID-19 exhibits extensive patient-to-patient heterogeneity. To link immune response variation to disease severity and outcome over time, we longitudinally assessed circulating proteins as well as 188 surface protein markers, transcriptome, and T cell receptor sequence simultaneously in single peripheral immune cells from COVID-19 patients. Conditional-independence network analysis revealed primary correlates of disease severity, including gene expression signatures of apoptosis in plasmacytoid dendritic cells and attenuated inflammation but increased fatty acid metabolism in CD56dimCD16hi NK cells linked positively to circulating interleukin (IL)-15. CD8+ T cell activation was apparent without signs of exhaustion. Although cellular inflammation was depressed in severe patients early after hospitalization, it became elevated by days 17–23 post symptom onset, suggestive of a late wave of inflammatory responses. Furthermore, circulating protein trajectories at this time were divergent between and predictive of recovery versus fatal outcomes. Our findings stress the importance of timing in the analysis, clinical monitoring, and therapeutic intervention of COVID-19.
Collapse
|
5
|
Gut/Oral Bacteria Variability May Explain the High Efficacy of Green Tea in Rodent Tumor Inhibition and Its Absence in Humans. Molecules 2020; 25:molecules25204753. [PMID: 33081212 PMCID: PMC7594096 DOI: 10.3390/molecules25204753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/09/2020] [Accepted: 10/12/2020] [Indexed: 02/07/2023] Open
Abstract
Consumption of green tea (GT) and GT polyphenols has prevented a range of cancers in rodents but has had mixed results in humans. Human subjects who drank GT for weeks showed changes in oral microbiome. However, GT-induced changes in RNA in oral epithelium were subject-specific, suggesting GT-induced changes of the oral epithelium occurred but differed across individuals. In contrast, studies in rodents consuming GT polyphenols revealed obvious changes in epithelial gene expression. GT polyphenols are poorly absorbed by digestive tract epithelium. Their metabolism by gut/oral microbial enzymes occurs and can alter absorption and function of these molecules and thus their bioactivity. This might explain the overall lack of consistency in oral epithelium RNA expression changes seen in human subjects who consumed GT. Each human has different gut/oral microbiomes, so they may have different levels of polyphenol-metabolizing bacteria. We speculate the similar gut/oral microbiomes in, for example, mice housed together are responsible for the minimal variance observed in tissue GT responses within a study. The consistency of the tissue response to GT within a rodent study eases the selection of a dose level that affects tumor rates. This leads to the theory that determination of optimal GT doses in a human requires knowledge about the gut/oral microbiome in that human.
Collapse
|
6
|
U-Statistical Inference for Hierarchical Clustering. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1796398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
MicroRNA-mRNA networks define translatable molecular outcome phenotypes in osteosarcoma. Sci Rep 2020; 10:4409. [PMID: 32157112 PMCID: PMC7064533 DOI: 10.1038/s41598-020-61236-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 02/03/2020] [Indexed: 12/30/2022] Open
Abstract
There is a lack of well validated prognostic biomarkers in osteosarcoma, a rare, recalcitrant disease for which treatment standards have not changed in over 20 years. We performed microRNA sequencing in 74 frozen osteosarcoma biopsy samples, constituting the largest single center translationally analyzed osteosarcoma cohort to date, and we separately analyzed a multi-omic dataset from a large NCI supported national cooperative group cohort. We validated the prognostic value of candidate microRNA signatures and contextualized them in relevant transcriptomic and epigenomic networks. Our results reveal the existence of molecularly defined phenotypes associated with outcome independent of clinicopathologic features. Through machine learning based integrative pharmacogenomic analysis, the microRNA biomarkers identify novel therapeutics for stratified application in osteosarcoma. The previously unrecognized osteosarcoma subtypes with distinct clinical courses and response to therapy could be translatable for discerning patients appropriate for more intensified, less intensified, or alternate therapeutic regimens.
Collapse
|
8
|
Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data. BMC Bioinformatics 2019; 20:221. [PMID: 31046657 PMCID: PMC6498510 DOI: 10.1186/s12859-019-2780-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 03/27/2019] [Indexed: 01/26/2023] Open
Abstract
Background Visualization is an important tool for generating meaning from scientific data, but the visualization of structures in high-dimensional data (such as from high-throughput assays) presents unique challenges. Dimension reduction methods are key in solving this challenge, but these methods can be misleading- especially when apparent clustering in the dimension-reducing representation is used as the basis for reasoning about relationships within the data. Results We present two interactive visualization tools, distnet and focusedMDS, that help in assessing the validity of a dimension-reducing plot and in interactively exploring relationships between objects in the data. The distnet tool is used to examine discrepancies between the placement of points in a two dimensional visualization and the points’ actual similarities in feature space. The focusedMDS tool is an intuitive, interactive multidimensional scaling tool that is useful for exploring the relationships of one particular data point to the others, that might be useful in a personalized medicine framework. Conclusions We introduce here two freely available tools for visually exploring and verifying the validity of dimension-reducing visualizations and biological information gained from these. The use of such tools can confirm that conclusions drawn from dimension-reducing visualizations are not simply artifacts of the visualization method, but are real biological insights. Electronic supplementary material The online version of this article (10.1186/s12859-019-2780-y) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
E2F signature is predictive for the pancreatic adenocarcinoma clinical outcome and sensitivity to E2F inhibitors, but not for the response to cytotoxic-based treatments. Sci Rep 2018; 8:8330. [PMID: 29844366 PMCID: PMC5974374 DOI: 10.1038/s41598-018-26613-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 05/14/2018] [Indexed: 12/26/2022] Open
Abstract
The main goal of this study was to find out strategies of clinical relevance to classify patients with a pancreatic ductal adenocarcinoma (PDAC) for individualized treatments. In the present study a set of 55 patient-derived xenografts (PDX) were obtained and their transcriptome were analyzed by using an Affymetrix approach. A supervised bioinformatics-based analysis let us to classify these PDX in two main groups named E2F-highly dependent and E2F-lowly dependent. Afterwards their characterization by using a Kaplan-Meier analysis demonstrated that E2F high patients survived significantly less than E2F low patients (9.5 months vs. 16.8 months; p = 0.0066). Then we tried to establish if E2F transcriptional target levels were associated to the response to cytotoxic treatments by comparing the IC50 values of E2F high and E2F low cells after gemcitabine, 5-fluorouracil, oxaliplatin, docetaxel or irinotecan treatment, and no association was found. Then we identified an E2F inhibitor compound, named ly101-4B, and we observed that E2F-higly dependent cells were more sensitive to its treatment (IC50 of 19.4 ± 1.8 µM vs. 44.1 ± 4.4 µM; p = 0.0061). In conclusion, in this work we describe an E2F target expression-based classification that could be predictive for patient outcome, but more important, for the sensitivity of tumors to the E2F inhibitors as a treatment. Finally, we can assume that phenotypic characterization, essentially by an RNA expression analysis of the PDAC, can help to predict their clinical outcome and their response to some treatments when are rationally selected.
Collapse
|
10
|
Abstract
Consumption of green tea (GT) extracts or purified catechins has shown the ability to prevent oral and other cancers and inhibit cancer progression in rodent models, but the evidence for this in humans is mixed. Working with humans, we sought to understand the source of variable responses to GT by examining its effects on oral epithelium. Lingual epithelial RNA and lingual and gingival microbiota were measured before and after 4 weeks of exposure in tobacco smokers, whom are at high risk of oral cancer. GT consumption had on average inconsistent effects on miRNA expression in the oral epithelium. Only analysis that examined paired miRNAs, showing changed and coordinated expression with GT exposure, provided evidence for a GT effect on miRNAs, identifying miRNAs co-expressed with two hubs, miR-181a-5p and 301a-3p. An examination of the microbiome on cancer prone lingual mucosa, in contrast, showed clear shifts in the relative abundance of Streptococcus and Staphylococcus, and other genera after GT exposure. These data support the idea that tea consumption can consistently change oral bacteria in humans, which may affect carcinogenesis, but argue that GT effects on oral epithelial miRNA expression in humans vary between individuals.
Collapse
|
11
|
Data Perturbation Independent Diagnosis and Validation of Breast Cancer Subtypes Using Clustering and Patterns. Cancer Inform 2017. [DOI: 10.1177/117693510600200006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Molecular stratification of disease based on expression levels of sets of genes can help guide therapeutic decisions if such classifications can be shown to be stable against variations in sample source and data perturbation. Classifications inferred from one set of samples in one lab should be able to consistently stratify a different set of samples in another lab. We present a method for assessing such stability and apply it to the breast cancer (BCA) datasets of Sorlie et al. 2003 and Ma et al. 2003. We find that within the now commonly accepted BCA categories identified by Sorlie et al. Luminal A and Basal are robust, but Luminal B and ERBB2+ are not. In particular, 36% of the samples identified as Luminal B and 55% identified as ERBB2+ cannot be assigned an accurate category because the classification is sensitive to data perturbation. We identify a “core cluster” of samples for each category, and from these we determine “patterns” of gene expression that distinguish the core clusters from each other. We find that the best markers for Luminal A and Basal are (ESR1, LIV1, GATA-3) and (CCNE1, LAD1, KRT5), respectively. Pathways enriched in the patterns regulate apoptosis, tissue remodeling and the immune response. We use a different dataset (Ma et al. 2003) to test the accuracy with which samples can be allocated to the four disease subtypes. We find, as expected, that the classification of samples identified as Luminal A and Basal is robust but classification into the other two subtypes is not.
Collapse
|
12
|
Abstract
BRB-ArrayTools is an integrated software system for the comprehensive analysis of DNA microarray experiments. It was developed by professional biostatisticians experienced in the design and analysis of DNA microarray studies and incorporates methods developed by leading statistical laboratories. The software is designed for use by biomedical scientists who wish to have access to state-of-the-art statistical methods for the analysis of gene expression data and to receive training in the statistical analysis of high dimensional data. The software provides the most extensive set of tools available for predictive classifier development and complete cross-validation. It offers extensive links to genomic websites for gene annotation and analysis tools for pathway analysis. An archive of over 100 datasets of published microarray data with associated clinical data is provided and BRB-ArrayTools automatically imports data from the Gene Expression Omnibus public archive at the National Center for Biotechnology Information.
Collapse
|
13
|
Novel near-diploid ovarian cancer cell line derived from a highly aneuploid metastatic ovarian tumor. PLoS One 2017; 12:e0182610. [PMID: 28787462 PMCID: PMC5546722 DOI: 10.1371/journal.pone.0182610] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 07/23/2017] [Indexed: 01/01/2023] Open
Abstract
A new ovarian near-diploid cell line, OVDM1, was derived from a highly aneuploid serous ovarian metastatic adenocarcinoma. A metastatic tumor was obtained from a 47-year-old Ashkenazi Jewish patient three years after the first surgery removed the primary tumor, both ovaries, and the remaining reproductive organs. OVDM1 was characterized by cell morphology, genotyping, tumorigenic assay, mycoplasma testing, spectral karyotyping (SKY), and molecular profiling of the whole genome by aCGH and gene expression microarray. Targeted sequencing of a panel of cancer-related genes was also performed. Hierarchical clustering of gene expression data clearly confirmed the ovarian origin of the cell line. OVDM1 has a near-diploid karyotype with a low-level aneuploidy, but samples of the original metastatic tumor were grossly aneuploid. A number of single nucleotide variations (SNVs)/mutations were detected in OVDM1 and the metastatic tumor samples. Some of them were cancer-related according to COSMIC and HGMD databases (no founder mutations in BRCA1 and BRCA2 have been found). A large number of focal copy number alterations (FCNAs) were detected, including homozygous deletions (HDs) targeting WWOX and GATA4. Progression of OVDM1 from early to late passages was accompanied by preservation of the near-diploid status, acquisition of only few additional large chromosomal rearrangements and more than 100 new small FCNAs. Most of newly acquired FCNAs seem to be related to localized but massive DNA fragmentation (chromothripsis-like rearrangements). Newly developed near-diploid OVDM1 cell line offers an opportunity to evaluate tumorigenesis pathways/events in a minor clone of metastatic ovarian adenocarcinoma as well as mechanisms of chromothripsis.
Collapse
|
14
|
Gene transcription profiling in wild and laboratory-exposed eels: Effect of captivity and in situ chronic exposure to pollution. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016; 571:92-102. [PMID: 27470668 DOI: 10.1016/j.scitotenv.2016.07.131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 07/13/2016] [Accepted: 07/18/2016] [Indexed: 06/06/2023]
Abstract
Aquatic ecosystems are subjected to a variety of man-induced stressors but also vary spatially and temporally due to variation in natural factors. In such complex environments, it remains difficult to detect, dissociate and evaluate the effects of contaminants in wild organisms. In this context, the aim of this study was to test whether the hepatic transcriptome profile of fish may be used to detect in situ exposure to a particular contaminant. Transcriptomic profiles from laboratory-exposed and wild eels sampled along a contamination gradient were compared. During laboratory experiments, fish were exposed during 45days to different pollutants (Hg, PCBs, OCPs or Cd) or natural factors (temperature, salinity or low food supply) at levels close to those found in the sampling sites. A strong difference was observed between the transcriptomic profiles obtained from wild and laboratory-exposed animals (whatever the sites or experimental conditions), suggesting a general stress induced by captivity in the laboratory. Among the biological functions that were up-regulated in laboratory eels in comparison to wild eels, histone modification was the most represented. This finding suggests that laboratory conditions could affect the epigenome of fish and thus modulate the transcriptional responses developed by fish in response to pollutant exposure. Among experimental conditions, only the transcription profiles of laboratory animals exposed to cold temperature were correlated with those obtained from wild fish, and more significantly with fish from contaminated sites. Common regulated genes were mainly involved in cell differentiation and liver development, suggesting that stem/progenitor liver cells could be involved in the adaptive response developed by fish chronically exposed to pollutant mixtures.
Collapse
|
15
|
|
16
|
Detecting the exposure to Cd and PCBs by means of a non-invasive transcriptomic approach in laboratory and wild contaminated European eels (Anguilla anguilla). ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2016; 23:5431-5441. [PMID: 26566612 DOI: 10.1007/s11356-015-5754-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/03/2015] [Indexed: 06/05/2023]
Abstract
Detecting and separating specific effects of contaminants in a multi-stress field context remain a major challenge in ecotoxicology. In this context, the aim of this study was to assess the usefulness of a non-invasive transcriptomic method, by means of a complementary DNA (cDNA) microarray comprising 1000 candidate genes, on caudal fin clips. Fin gene transcription patterns of European eels (Anguilla anguilla) exposed in the laboratory to cadmium (Cd) or a polychloro-biphenyl (PCBs) mixture but also of wild eels from three sampling sites with differing contamination levels were compared to test whether fin clips may be used to detect and discriminate the exposure to these contaminants. Also, transcriptomic profiles from the liver and caudal fin of eels experimentally exposed to Cd were compared to assess the detection sensitivity of the fin transcriptomic response. A similar number of genes were differentially transcribed in the fin and liver in response to Cd exposure, highlighting the detection sensitivity of fin clips. Moreover, distinct fin transcription profiles were observed in response to Cd or PCB exposure. Finally, the transcription profiles of eels from the most contaminated site clustered with those from laboratory-exposed fish. This study thus highlights the applicability and usefulness of performing gene transcription assays on non-invasive tissue sampling in order to detect the in situ exposure to Cd and PCBs in fish.
Collapse
|
17
|
Abstract
Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available, when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are a few very large eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues, which leads to a much improved SigClust. Major improvements in SigClust performance are shown by both mathematical analysis, based on the new notion of Theoretical Cluster Index, and extensive simulation studies. Applications to some cancer genomic data further demonstrate the usefulness of these improvements.
Collapse
|
18
|
Molecular subtypes of high-grade serous ovarian cancer: the holy grail? J Natl Cancer Inst 2014; 106:dju297. [PMID: 25269490 DOI: 10.1093/jnci/dju297] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
|
19
|
Critical limitations of consensus clustering in class discovery. Sci Rep 2014; 4:6207. [PMID: 25158761 PMCID: PMC4145288 DOI: 10.1038/srep06207] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/08/2014] [Indexed: 11/09/2022] Open
Abstract
Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods.
Collapse
|
20
|
Neonatal atlas construction using sparse representation. Hum Brain Mapp 2014; 35:4663-77. [PMID: 24638883 DOI: 10.1002/hbm.22502] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/11/2014] [Accepted: 02/18/2014] [Indexed: 11/05/2022] Open
Abstract
Atlas construction generally includes first an image registration step to normalize all images into a common space and then an atlas building step to fuse the information from all the aligned images. Although numerous atlas construction studies have been performed to improve the accuracy of the image registration step, unweighted or simply weighted average is often used in the atlas building step. In this article, we propose a novel patch-based sparse representation method for atlas construction after all images have been registered into the common space. By taking advantage of local sparse representation, more anatomical details can be recovered in the built atlas. To make the anatomical structures spatially smooth in the atlas, the anatomical feature constraints on group structure of representations and also the overlapping of neighboring patches are imposed to ensure the anatomical consistency between neighboring patches. The proposed method has been applied to 73 neonatal MR images with poor spatial resolution and low tissue contrast, for constructing a neonatal brain atlas with sharp anatomical details. Experimental results demonstrate that the proposed method can significantly enhance the quality of the constructed atlas by discovering more anatomical details especially in the highly convoluted cortical regions. The resulting atlas demonstrates superior performance of our atlas when applied to spatially normalizing three different neonatal datasets, compared with other start-of-the-art neonatal brain atlases.
Collapse
|
21
|
Oxalate upregulates expression of IL-2Rβ and activates IL-2R signaling in HK-2 cells, a line of human renal epithelial cells. Am J Physiol Renal Physiol 2014; 306:F1039-46. [PMID: 24523387 DOI: 10.1152/ajprenal.00462.2013] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The role of inflammation in oxalate-induced nephrolithiasis is debated. Our gene expression study indicated an increase in interleukin-2 receptor β (IL-2Rβ) mRNA in response to oxalate (Koul S, Khandrika L, Meacham RB, Koul HK. PLoS ONE 7: e43886, 2012). Herein, we evaluated IL-2Rβ expression and its downstream signaling pathway in HK-2 cells in an effort to understand the mechanisms of oxalate nephrotoxicity. HK-2 cells were exposed to oxalate for various time points in the presence or absence of SB203580, a specific p38 MAPK inhibitor. Gene expression data were analyzed by Ingenuity Pathway Analysis software. mRNA expression was quantitated via real-time PCR, and changes in protein expression/kinase activation were analyzed by Western blotting. Exposure of HK-2 cells to oxalate resulted in increased transcription of IL-2Rβ mRNA and increased protein levels. Oxalate treatment also activated the IL-2Rβ signaling pathway (JAK1/STAT5 phosphorylation). Moreover, the increase in IL-2Rβ protein was dependent upon p38 MAPK activity. These results suggest that oxalate-induced activation of the IL-2Rβ pathway may lead to a plethora of cellular changes, the most common of which is the induction of inflammation. These results suggest a central role for the p38 MAPK pathway in mediating the effects of oxalate in renal cells, and additional studies may provide the key to unlocking novel biochemical targets in stone disease.
Collapse
|
22
|
Bootstrap method to evaluate tightness of clusters with application to the Korean standard occlusion study. J STAT COMPUT SIM 2014. [DOI: 10.1080/00949655.2012.709517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
23
|
Abstract
The fundamental strategy of the current postgenomic era or the era of functional genomics is to expand the scale of biologic research from studying single genes or proteins to studying all genes or proteins simultaneously using a systematic approach. As recently developed methods for obtaining genome-wide mRNA expression data, oligonucleotide and DNA microarrays are particularly powerful in the context of knowing the entire genome sequence and can provide a global view of changes in gene expression patterns in response to physiologic alterations or manipulation of transcriptional regulators. In biomedical research, such an approach will ultimately determine biologic behavior of both normal and diseased tissues, which may provide insights into disease mechanisms and identify novel markers and candidates for diagnostic, prognostic and therapeutic intervention. However, microarray technology is still in a continuous state of evolution and development, and it may take time to implement microarrays as a routine medical device. Many limitations exist and many challenges remain to be achieved to help inclusion of microarrays in clinical medicine. In this review, a brief history of microarrays in biomedical research is provided, including experimental overview, limitations, challenges and future developments.
Collapse
|
24
|
An integrated approach (CLuster Analysis Integration Method) to combine expression data and protein-protein interaction networks in agrigenomics: application on Arabidopsis thaliana. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:155-65. [PMID: 24404838 DOI: 10.1089/omi.2013.0050] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Experimental co-expression data and protein-protein interaction networks are frequently used to analyze the interactions among genes or proteins. Recent studies have investigated methods to integrate these two sources of information. We propose a new method to integrate co-expression data obtained through DNA microarray analysis (MA) and protein-protein interaction (PPI) network data, and apply it to Arabidopsis thaliana. The proposed method identifies small subsets of highly interacting proteins. Based on the analysis of the basis of co-localization and mRNA developmental expression, we show that these groups provide important biological insights; additionally, these subsets are significantly enriched with respect to KEGG Pathways and can be used to predict successfully whether proteins belong to known pathways. Thus, the method is able to provide relevant biological information and support the functional identification of complex genetic traits of economic value in plant agrigenomics research. The method has been implemented in a prototype software tool named CLAIM (CLuster Analysis Integration Method) and can be downloaded from http://bio.cs.put.poznan.pl/research_fields . CLAIM is based on the separate clustering of MA and PPI data; the clusters are merged in a special graph; cliques of this graph are subsets of strongly connected proteins. The proposed method was successfully compared with existing methods. CLAIM appears to be a useful semi-automated tool for protein functional analysis and warrants further evaluation in agrigenomics research.
Collapse
|
25
|
Widespread decreased expression of immune function genes in human peripheral blood following radiation exposure. Radiat Res 2013; 180:575-83. [PMID: 24168352 DOI: 10.1667/rr13343.1] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
We report a large-scale reduced expression of genes in pathways related to cell-type specific immunity functions that emerges from microarray analysis 48 h after ex vivo γ-ray irradiation (0, 0.5, 2, 5, 8 Gy) of human peripheral blood from five donors. This response is similar to that seen in patients at 24 h after the start of total-body irradiation and strengthens the rationale for the ex vivo model as an adjunct to human in vivo studies. The most marked response was in genes associated with natural killer (NK) cell immune functions, reflecting a relative loss of NK cells from the population. T- and B-cell mediated immunity genes were also significantly represented in the radiation response. Combined with our previous studies, a single gene expression signature was able to predict radiation dose range with 97% accuracy at times from 6-48 h after exposure. Gene expression signatures that may report on the loss or functional deactivation of blood cell subpopulations after radiation exposure may be particularly useful both for triage biodosimetry and for monitoring the effect of radiation mitigating treatments.
Collapse
|
26
|
Abstract
MOTIVATION Validation and reproducibility of results is a central and pressing issue in genomics. Several recent embarrassing incidents involving the irreproducibility of high-profile studies have illustrated the importance of this issue and the need for rigorous methods for the assessment of reproducibility. RESULTS Here, we describe an existing statistical model that is very well suited to this problem. We explain its utility for assessing the reproducibility of validation experiments, and apply it to a genome-scale study of adenosine deaminase acting on RNA (ADAR)-mediated RNA editing in Drosophila. We also introduce a statistical method for planning validation experiments that will obtain the tightest reproducibility confidence limits, which, for a fixed total number of experiments, returns the optimal number of replicates for the study. AVAILABILITY Downloadable software and a web service for both the analysis of data from a reproducibility study and for the optimal design of these studies is provided at http://ccmbweb.ccv.brown.edu/reproducibility.html .
Collapse
|
27
|
Evidence of dynamically dysregulated gene expression pathways in hyperresponsive B cells from African American lupus patients. PLoS One 2013; 8:e71397. [PMID: 23977035 PMCID: PMC3744560 DOI: 10.1371/journal.pone.0071397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 06/29/2013] [Indexed: 01/07/2023] Open
Abstract
Recent application of gene expression profiling to the immune system has shown a great potential for characterization of complex regulatory processes. It is becoming increasingly important to characterize functional systems through multigene interactions to provide valuable insights into differences between healthy controls and autoimmune patients. Here we apply an original systematic approach to the analysis of changes in regulatory gene interconnections between in Epstein-Barr virus transformed hyperresponsive B cells from SLE patients and normal control B cells. Both traditional analysis of differential gene expression and analysis of the dynamics of gene expression variations were performed in combination to establish model networks of functional gene expression. This Pathway Dysregulation Analysis identified known transcription factors and transcriptional regulators activated uniquely in stimulated B cells from SLE patients.
Collapse
|
28
|
Epigenetic expansion of VHL-HIF signal output drives multiorgan metastasis in renal cancer. Nat Med 2012; 19:50-6. [PMID: 23223005 PMCID: PMC3540187 DOI: 10.1038/nm.3029] [Citation(s) in RCA: 164] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 11/14/2012] [Indexed: 12/13/2022]
Abstract
Inactivation of the von Hippel-Lindau tumor suppressor gene, VHL, is an archetypical tumor-initiating event in clear cell renal carcinoma (ccRCC) that leads to the activation of hypoxia-inducible transcription factors (HIFs). However, VHL mutation status in ccRCC is not correlated with clinical outcome. Here we show that during ccRCC progression, cancer cells exploit diverse epigenetic alterations to empower a branch of the VHL-HIF pathway for metastasis, and the strength of this activation is associated with poor clinical outcome. By analyzing metastatic subpopulations of VHL-deficient ccRCC cells, we discovered an epigenetically altered VHL-HIF response that is specific to metastatic ccRCC. Focusing on the two most prominent pro-metastatic VHL-HIF target genes, we show that loss of Polycomb repressive complex 2 (PRC2)-dependent histone H3 Lys27 trimethylation (H3K27me3) activates HIF-driven chemokine (C-X-C motif) receptor 4 (CXCR4) expression in support of chemotactic cell invasion, whereas loss of DNA methylation enables HIF-driven cytohesin 1 interacting protein (CYTIP) expression to protect cancer cells from death cytokine signals. Thus, metastasis in ccRCC is based on an epigenetically expanded output of the tumor-initiating pathway.
Collapse
|
29
|
Genome-wide analysis of gene and protein expression of dysplastic naevus cells. J Skin Cancer 2012; 2012:981308. [PMID: 23251804 PMCID: PMC3515917 DOI: 10.1155/2012/981308] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Revised: 10/10/2012] [Accepted: 10/11/2012] [Indexed: 01/20/2023] Open
Abstract
Cutaneous melanoma, a type of skin tumor originating from melanocytes, often develops from premalignant naevoid lesions via a gradual transformation process driven by an accumulation of (epi)genetic lesions. These dysplastic naevi display altered morphology and often proliferation of melanocytes. Additionally, melanocytes in dysplastic naevi show structural mitochondrial and melanosomal alterations and have elevated reactive oxygen species (ROS) levels. For this study we performed genome-wide expression and proteomic analysis of melanocytes from dysplastic naevus (DNMC) and adjacent normal skin (MC) from 18 patients. Whole genome expression profiles of the DNMC and MC of each individual patient subjected to GO-based comparative statistical analysis yielded significantly differentially expressed GO classes including “organellar ribosome,” “mitochondrial ribosome,” “hydrogen ion transporter activity,” and “prefoldin complex.” Validation of 5 genes from these top GO classes revealed a heterogeneous differential expression pattern. Proteomic analysis demonstrated differentially expressed proteins in DNMC that are involved in cellular metabolism, detoxification, and cytoskeletal organization processes, such as GTP-binding Rho-like protein CDC42, glutathione-S-transferase omega-1 and prolyl 4-hydroxylase. Collectively these results point to deregulation of cellular processes, such as metabolism and protein synthesis, consistent with the observed elevated oxidative stress levels in DNMC potentially resulting in oxidative DNA damage in these cells.
Collapse
|
30
|
Genome wide analysis of differentially expressed genes in HK-2 cells, a line of human kidney epithelial cells in response to oxalate. PLoS One 2012; 7:e43886. [PMID: 23028475 PMCID: PMC3446971 DOI: 10.1371/journal.pone.0043886] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 07/27/2012] [Indexed: 11/30/2022] Open
Abstract
Nephrolithiasis is a multi-factorial disease which, in the majority of cases, involves the renal deposition of calcium oxalate. Oxalate is a metabolic end product excreted primarily by the kidney. Previous studies have shown that elevated levels of oxalate are detrimental to the renal epithelial cells; however, oxalate renal epithelial cell interactions are not completely understood. In this study, we utilized an unbiased approach of gene expression profiling using Affymetrix HG_U133_plus2 gene chips to understand the global gene expression changes in human renal epithelial cells [HK-2] after exposure to oxalate. We analyzed the expression of 47,000 transcripts and variants, including 38,500 well characterized human genes, in the HK2 cells after 4 hours and 24 hours of oxalate exposure. Gene expression was compared among replicates as per the Affymetrix statistical program. Gene expression among various groups was compared using various analytical tools, and differentially expressed genes were classified according to the Gene Ontology Functional Category. The results from this study show that oxalate exposure induces significant expression changes in many genes. We show for the first time that oxalate exposure induces as well as shuts off genes differentially. We found 750 up-regulated and 2276 down-regulated genes which have not been reported before. Our results also show that renal cells exposed to oxalate results in the regulation of genes that are associated with specific molecular function, biological processes, and other cellular components. In addition we have identified a set of 20 genes that is differentially regulated by oxalate irrespective of duration of exposure and may be useful in monitoring oxalate nephrotoxicity. Taken together our studies profile global gene expression changes and provide a unique insight into oxalate renal cell interactions and oxalate nephrotoxicity.
Collapse
|
31
|
Genetic signatures shared in embryonic liver development and liver cancer define prognostically relevant subgroups in HCC. Mol Cancer 2012; 11:55. [PMID: 22891627 PMCID: PMC3583209 DOI: 10.1186/1476-4598-11-55] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 07/12/2012] [Indexed: 12/25/2022] Open
Abstract
Multiple activations of individual genes during embryonic liver and HCC development have repeatedly prompted speculations about conserved embryonic signatures driving cancer development. Recently, the emerging discussion on cancer stem cells and the appreciation that generally tumors may develop from progenitor cells of diverse stages of cellular differentiation has shed increasing light on the overlapping genetic signatures between embryonic liver development and HCC. However there is still a lack of systematic studies investigating this area. We therefore performed a comprehensive analysis of differentially regulated genetic signaling pathways in embryonic and liver cancer development and investigated their biological relevance. Genetic signaling pathways were investigated on several publically available genome wide microarray experiments on liver development and HCC. Differentially expressed genes were investigated for pathway enrichment or underrepresentation compared to KEGG annotated pathways by Fisher exact evaluation. The comparative analysis of enrichment and under representation of differentially regulated genes in liver development and HCC demonstrated a significant overlap between multiple pathways. Most strikingly we demonstrated a significant overlap not only in pathways expected to be relevant to both conditions such as cell cycle or apoptosis but also metabolic pathways associated with carbohydrate and lipid metabolism. Furthermore, we demonstrated the clinical significance of these findings as unsupervised clustering of HCC patients on the basis of these metabolic pathways displayed significant differences in survival. These results indicate that liver development and liver cancer share similar alterations in multiple genetic signaling pathways. Several pathways with markedly similar patterns of enrichment or underrepresentation of various regulated genes between liver development and HCC are of prognostic relevance in HCC. In particular, the metabolic pathways were identified as novel prognostically relevant players in HCC development.
Collapse
|
32
|
Increased levels of circulating cytokines with HIV-related immunosuppression. AIDS Res Hum Retroviruses 2012; 28:809-15. [PMID: 21962239 DOI: 10.1089/aid.2011.0144] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Cytokines may contribute to the severity of CD4 cell depletion with human immunodeficiency virus (HIV) infection, but quantitative relationships are not well defined. Serum and plasma from 181 HIV-infected individuals were tested with Millipore 30-plex Luminex cytokine assays. Within-individual correlations among cytokines were summarized by two-dimensional hierarchical cluster analysis. Associations with age, sex, race, CD4 count, and HIV viral load were determined with linear regression models. Tests for statistical significance were corrected for multiple comparisons, using a false discovery rate of 0.1. African-Americans had significantly higher levels than whites of six cytokines (IL-2, IL-5, IL-7, IL-15, fractalkine, and IFN-γ), and lower levels of MCP-1. Females had higher fractalkine levels than males. Age was not associated with levels of any cytokine. Six cytokines, including the T-helper (Th) type 1 cytokine IL-15, the Th2 cytokines IL-1ra and IL-10, the chemokines fractalkine and MCP-1, and the growth factor G-CSF were each inversely associated with CD4 count; no cytokine was directly associated with CD4 count. Fractalkine was directly associated with HIV viral load, adjusted for CD4 count. Cytokines clustered by primary function (e.g., Th1, Th2, proinflammatory, chemokines, or growth factors) whereas individuals clustered according to cytokine levels (generally high, intermediate, or low) had significantly different CD4 counts [medians (interquartile range) of 60 (17-162), 131 (62-321), and 155 (44-467), respectively; p<0.0001]. CD4 deficiency is associated with generalized increases in cytokines of various functions. Racial differences in cytokine response to HIV infection could contribute to disparities in disease progression.
Collapse
|
33
|
Colon cancer molecular subtypes identified by expression profiling and associated to stroma, mucinous type and different clinical behavior. BMC Cancer 2012; 12:260. [PMID: 22712570 PMCID: PMC3571914 DOI: 10.1186/1471-2407-12-260] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 05/18/2012] [Indexed: 12/15/2022] Open
Abstract
Background Colon cancer patients with the same stage show diverse clinical behavior due to tumor heterogeneity. We aimed to discover distinct classes of tumors based on microarray expression patterns, to analyze whether the molecular classification correlated with the histopathological stages or other clinical parameters and to study differences in the survival. Methods Hierarchical clustering was performed for class discovery in 88 colon tumors (stages I to IV). Pathways analysis and correlations between clinical parameters and our classification were analyzed. Tumor subtypes were validated using an external set of 78 patients. A 167 gene signature associated to the main subtype was generated using the 3-Nearest-Neighbor method. Coincidences with other prognostic predictors were assesed. Results Hierarchical clustering identified four robust tumor subtypes with biologically and clinically distinct behavior. Stromal components (p < 0.001), nuclear β-catenin (p = 0.021), mucinous histology (p = 0.001), microsatellite-instability (p = 0.039) and BRAF mutations (p < 0.001) were associated to this classification but it was independent of Dukes stages (p = 0.646). Molecular subtypes were established from stage I. High-stroma-subtype showed increased levels of genes and altered pathways distinctive of tumour-associated-stroma and components of the extracellular matrix in contrast to Low-stroma-subtype. Mucinous-subtype was reflected by the increased expression of trefoil factors and mucins as well as by a higher proportion of MSI and BRAF mutations. Tumor subtypes were validated using an external set of 78 patients. A 167 gene signature associated to the Low-stroma-subtype distinguished low risk patients from high risk patients in the external cohort (Dukes B and C:HR = 8.56(2.53-29.01); Dukes B,C and D:HR = 1.87(1.07-3.25)). Eight different reported survival gene signatures segregated our tumors into two groups the Low-stroma-subtype and the other tumor subtypes. Conclusions We have identified novel molecular subtypes in colon cancer with distinct biological and clinical behavior that are established from the initiation of the tumor. Tumor microenvironment is important for the classification and for the malignant power of the tumor. Differential gene sets and biological pathways characterize each tumor subtype reflecting underlying mechanisms of carcinogenesis that may be used for the selection of targeted therapeutic procedures. This classification may contribute to an improvement in the management of the patients with CRC and to a more comprehensive prognosis.
Collapse
|
34
|
Integrative genomic identification of genes on 8p associated with hepatocellular carcinoma progression and patient survival. Gastroenterology 2012; 142:957-966.e12. [PMID: 22202459 PMCID: PMC3321110 DOI: 10.1053/j.gastro.2011.12.039] [Citation(s) in RCA: 253] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Revised: 12/02/2011] [Accepted: 12/15/2011] [Indexed: 12/28/2022]
Abstract
BACKGROUND & AIMS Hepatocellular carcinoma (HCC) is an aggressive malignancy; its mechanisms of development and progression are poorly understood. We used an integrative approach to identify HCC driver genes, defined as genes whose copy numbers associate with gene expression and cancer progression. METHODS We combined data from high-resolution, array-based comparative genomic hybridization and transcriptome analysis of HCC samples from 76 patients with hepatitis B virus infection with data on patient survival times. Candidate genes were functionally validated using in vitro and in vivo models. RESULTS Unsupervised analyses of array comparative genomic hybridization data associated loss of chromosome 8p with poor outcome (reduced survival time); somatic copy number alterations correlated with expression of 27.3% of genes analyzed. We associated expression levels of 10 of these genes with patient survival times in 2 independent cohorts (comprising 319 cases of HCC with mixed etiology) and 3 breast cancer cohorts (637 cases). Among the 10-gene signature, a cluster of 6 genes on 8p, (DLC1, CCDC25, ELP3, PROSC, SH2D4A, and SORBS3) were deleted in HCCs from patients with poor outcomes. In vitro and in vivo analyses indicated that the products of PROSC, SH2D4A, and SORBS3 have tumor-suppressive activities, along with the known tumor suppressor gene DLC1. CONCLUSIONS We used an unbiased approach to identify 10 genes associated with HCC progression. These might be used in assisting diagnosis and to stage tumors based on gene expression patterns.
Collapse
|
35
|
|
36
|
Too many numbers: Microarrays in clinical cancer research. STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES 2012; 43:37-51. [PMID: 22326071 DOI: 10.1016/j.shpsc.2011.10.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
|
37
|
|
38
|
Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012; 1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]
Abstract
Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.
Collapse
|
39
|
|
40
|
Identification of a SOX2-dependent subset of tumor- and sphere-forming glioblastoma cells with a distinct tyrosine kinase inhibitor sensitivity profile. Neuro Oncol 2011; 13:1178-91. [PMID: 21940738 PMCID: PMC3199157 DOI: 10.1093/neuonc/nor113] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Putative cancer stem cells have been identified in glioblastomas and are associated with radio- and chemo-resistance. Further knowledge about these cells is thus highly warranted for the development of better glioblastoma therapies. Gene expression analyses of 11 high-grade glioma cultures identified 2 subsets, designated type A and type B cultures. The type A cultures displayed high expression of CXCR4, SOX2, EAAT1, and GFAP and low expression of CNP, PDGFRB, CXCL12, and extracellular matrix proteins. Clinical significance of the 2 types was indicated by the expression of type A– and type B–defining genes in different clinical glioblastoma samples. Classification of glioblastomas with type A– and type B–defining genes generated 2 groups of tumors composed predominantly of the classical, neural, and/or proneural subsets and the mesenchymal subset, respectively. Furthermore, tumors with EGFR mutations were enriched in the group of type A samples. Type A cultures possessed a higher capacity to form xenograft tumors and neurospheres and displayed low or no sensitivity to monotreatment with PDGF- and IGF-1–receptor inhibitors but were efficiently growth inhibited by combination treatment with low doses of these 2 inhibitors. Furthermore, siRNA-induced downregulation of SOX2 reduced sphere formation of type A cultures, decreased expression of type A–defining genes, and conferred sensitivity to monotreatment with PDGF- and IGF-1–receptor inhibitors. The present study thus describes a tumor- and neurosphere-forming SOX2-dependent subset of glioblastoma cultures characterized by a gene expression signature similar to that of the recently described classical, proneural, and/or neural subsets of glioblastoma. The findings that resistance to PDGF- and IGF-1–receptor inhibitors is related to SOX2 expression and can be overcome by combination treatment should be considered in ongoing efforts to develop novel stem cell–targeting therapies.
Collapse
|
41
|
Molecular signatures in post-mortem brain tissue of younger individuals at high risk for Alzheimer's disease as based on APOE genotype. Mol Psychiatry 2011; 16:836-47. [PMID: 20479757 PMCID: PMC2953572 DOI: 10.1038/mp.2010.57] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Alzheimer's disease (AD) is a neurodegenerative condition characterized histopathologically by neuritic plaques and neurofibrillary tangles. The objective of this transcriptional profiling study was to identify both neurosusceptibility and intrinsic neuroprotective factors at the molecular level, not confounded by the downstream consequences of pathology. We thus studied post-mortem cortical tissue in 28 cases that were non-APOE4 carriers (called the APOE3 group) and 13 cases that were APOE4 carriers. As APOE genotype is the major genetic risk factor for late-onset AD, the former group was at low risk for development of the disease and the latter group was at high risk for the disease. Mean age at death was 42 years and none of the brains had histopathology diagnostic of AD at the time of death. We first derived interregional difference scores in expression between cortical tissue from a region relatively invulnerable to AD (primary somatosensory cortex, BA 1/2/3) and an area known to be susceptible to AD pathology (middle temporal gyrus, BA 21). We then contrasted the magnitude of these interregional differences in between-group comparisons of the APOE3 (low risk) and APOE4 (high risk) genotype groups. We identified 70 transcripts that differed significantly between the groups. These included EGFR, CNTFR, CASP6, GRIA2, CTNNB1, FKBPL, LGALS1 and PSMC5. Using real-time quantitative PCR, we validated these findings. In addition, we found regional differences in the expression of APOE itself. We also identified multiple Kyoto pathways that were disrupted in the APOE4 group, including those involved in mitochondrial function, calcium regulation and cell-cycle reentry. To determine the functional significance of our transcriptional findings, we used bioinformatics pathway analyses to demonstrate that the molecules listed above comprised a network of connections with each other, APOE, and APP and MAPT. Overall, our results indicated that the abnormalities that we observed in single transcripts and in signaling pathways were not the consequences of diagnostic plaque and tangle pathology, but preceded it and thus may be a causative link in the long molecular prodrome that results in clinical AD.
Collapse
|
42
|
Pathways activated during human asthma exacerbation as revealed by gene expression patterns in blood. PLoS One 2011; 6:e21902. [PMID: 21779351 PMCID: PMC3136489 DOI: 10.1371/journal.pone.0021902] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2010] [Accepted: 06/14/2011] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Asthma exacerbations remain a major unmet clinical need. The difficulty in obtaining airway tissue and bronchoalveolar lavage samples during exacerbations has greatly hampered study of naturally occurring exacerbations. This study was conducted to determine if mRNA profiling of peripheral blood mononuclear cells (PBMCs) could provide information on the systemic molecular pathways involved during asthma exacerbations. METHODOLOGY/PRINCIPAL FINDINGS Over the course of one year, gene expression levels during stable asthma, exacerbation, and two weeks after an exacerbation were compared using oligonucleotide arrays. For each of 118 subjects who experienced at least one asthma exacerbation, the gene expression patterns in a sample of peripheral blood mononuclear cells collected during an exacerbation episode were compared to patterns observed in multiple samples from the same subject collected during quiescent asthma. Analysis of covariance identified genes whose levels of expression changed during exacerbations and returned to quiescent levels by two weeks. Heterogeneity among visits in expression profiles was examined using K-means clustering. Three distinct exacerbation-associated gene expression signatures were identified. One signature indicated that, even among patients without symptoms of respiratory infection, genes of innate immunity were activated. Antigen-independent T cell activation mediated by IL15 was also indicated by this signature. A second signature revealed strong evidence of lymphocyte activation through antigen receptors and subsequent downstream events of adaptive immunity. The number of genes identified in the third signature was too few to draw conclusions on the mechanisms driving those exacerbations. CONCLUSIONS/SIGNIFICANCE This study has shown that analysis of PBMCs reveals systemic changes accompanying asthma exacerbation and has laid the foundation for future comparative studies using PBMCs.
Collapse
|
43
|
Internal standard-based analysis of microarray data2--analysis of functional associations between HVE-genes. Nucleic Acids Res 2011; 39:7881-99. [PMID: 21715372 PMCID: PMC3185418 DOI: 10.1093/nar/gkr503] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
In this work we apply the Internal Standard-based analytical approach that we described in an earlier communication and here we demonstrate experimental results on functional associations among the hypervariably-expressed genes (HVE-genes). Our working assumption was that those genetic components, which initiate the disease, involve HVE-genes for which the level of expression is undistinguishable among healthy individuals and individuals with pathology. We show that analysis of the functional associations of the HVE-genes is indeed suitable to revealing disease-specific differences. We show also that another possible exploit of HVE-genes for characterization of pathological alterations is by using multivariate classification methods. This in turn offers important clues on naturally occurring dynamic processes in the organism and is further used for dynamic discrimination of groups of compared samples. We conclude that our approach can uncover principally new collective differences that cannot be discerned by individual gene analysis.
Collapse
|
44
|
Gene expression profiling assigns CHEK2 1100delC breast cancers to the luminal intrinsic subtypes. Breast Cancer Res Treat 2011; 132:439-48. [DOI: 10.1007/s10549-011-1588-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 05/11/2011] [Indexed: 12/24/2022]
|
45
|
Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement. J Natl Cancer Inst 2011; 103:662-73. [PMID: 21421860 PMCID: PMC3079850 DOI: 10.1093/jnci/djr071] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Breast cancers can be classified by hierarchical clustering using an “intrinsic” gene list into one of at least five molecular subtypes: basal-like, HER2, luminal A, luminal B, and normal breast-like. Five different intrinsic gene lists composed of varying numbers of genes have been used for molecular subtype identification and classification of breast cancers. The aim of this study was to determine the objectivity and interobserver reproducibility of the assignment of molecular subtype classes by hierarchical cluster analysis. Methods Three publicly available breast cancer datasets (n = 779) were subjected to two-way average-linkage hierarchical cluster analysis using five distinct intrinsic gene lists. We used free-marginal Kappa statistics to analyze interobserver agreement among five breast cancer researchers for the whole classification and for each molecular subtype separately according to each intrinsic gene list for each breast cancer dataset. Results None of the classification systems tested produced almost perfect agreement (Kappa ≥ 0.81) among observers. However, substantial interobserver agreement (70.8% to 76.1% of the samples and free-marginal Kappa scores from 0.635 to 0.701) was consistently observed in all datasets for four molecular subtypes (luminal, basal-like, HER2, and normal breast-like). When luminal cancers were subdivided (luminal A, B, and C), none of the classification systems produced substantial agreement (Kappa ≥ 0.61) in all the datasets analyzed. Analysis of each subtype separately revealed that only two (basal-like and HER2) could be reproducibly identified by independent observers (Kappa ≥ 0.81). Conclusions Assignment of molecular subtype classes of breast cancer based on the analysis of dendrograms obtained with hierarchical cluster analysis is subjective and shows modest interobserver reproducibility. For the development of a molecular taxonomy, objective definitions for each molecular subtype and standardized methods for their identification are required.
Collapse
|
46
|
Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing. PLoS One 2011; 6:e17490. [PMID: 21364760 PMCID: PMC3045451 DOI: 10.1371/journal.pone.0017490] [Citation(s) in RCA: 122] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 02/03/2011] [Indexed: 12/21/2022] Open
Abstract
We used deep sequencing technology to profile the transcriptome, gene copy number, and CpG island methylation status simultaneously in eight commonly used breast cell lines to develop a model for how these genomic features are integrated in estrogen receptor positive (ER+) and negative breast cancer. Total mRNA sequence, gene copy number, and genomic CpG island methylation were carried out using the Illumina Genome Analyzer. Sequences were mapped to the human genome to obtain digitized gene expression data, DNA copy number in reference to the non-tumor cell line (MCF10A), and methylation status of 21,570 CpG islands to identify differentially expressed genes that were correlated with methylation or copy number changes. These were evaluated in a dataset from 129 primary breast tumors. Gene expression in cell lines was dominated by ER-associated genes. ER+ and ER− cell lines formed two distinct, stable clusters, and 1,873 genes were differentially expressed in the two groups. Part of chromosome 8 was deleted in all ER− cells and part of chromosome 17 amplified in all ER+ cells. These loci encoded 30 genes that were overexpressed in ER+ cells; 9 of these genes were overexpressed in ER+ tumors. We identified 149 differentially expressed genes that exhibited differential methylation of one or more CpG islands within 5 kb of the 5′ end of the gene and for which mRNA abundance was inversely correlated with CpG island methylation status. In primary tumors we identified 84 genes that appear to be robust components of the methylation signature that we identified in ER+ cell lines. Our analyses reveal a global pattern of differential CpG island methylation that contributes to the transcriptome landscape of ER+ and ER− breast cancer cells and tumors. The role of gene amplification/deletion appears to more modest, although several potentially significant genes appear to be regulated by copy number aberrations.
Collapse
|
47
|
Abstract
In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation. In network applications, one is often interested in studying whether modules are preserved across multiple networks. For example, to determine whether a pathway of genes is perturbed in a certain condition, one can study whether its connectivity pattern is no longer preserved. Non-preserved modules can either be biologically uninteresting (e.g., reflecting data outliers) or interesting (e.g., reflecting sex specific modules). An intuitive approach for studying module preservation is to cross-tabulate module membership. But this approach often cannot address questions about the preservation of connectivity patterns between nodes. Thus, cross-tabulation based approaches often fail to recognize that important aspects of a network module are preserved. Cross-tabulation methods make it difficult to argue that a module is not preserved. The weak statement (“the reference module does not overlap with any of the identified test set modules”) is less relevant in practice than the strong statement (“the module cannot be found in the test network irrespective of the parameter settings of the module detection procedure”). Module preservation statistics have important applications, e.g. we show that the wiring of apoptosis genes in a human cortical network differs from that in chimpanzees.
Collapse
|
48
|
Hand impairment in systemic sclerosis: association of different hand indices with organ involvement. Scand J Rheumatol 2010; 39:393-7. [PMID: 20476855 DOI: 10.3109/03009741003629028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE To evaluate the association between the assessment tools used to quantify hand impairment and organ involvement in patients with systemic sclerosis (SSc). METHODS Eighty consecutive SSc patients were assessed for hand impairment using the Hand Anatomic Index (HAI), finger-to-palm distance in flexion (FTP), and the Hand Mobility in Scleroderma (HAMIS) test. Cluster analysis was used to identify patients having similar characteristics on the basis of the pattern of organ involvement in order to create clinically homogeneous groups, and to correlate these clusters with the measures of hand involvement. Finally, we evaluated the discriminating ability of the indices to identify the patients whose clinical condition was more severe. RESULTS Two major clusters were identified by cluster analysis on the basis of organ involvement. The first (cluster A) included 61 patients and the second (cluster B) 19 patients characterized by minor and major extent of organ involvement, respectively. The extent of organ involvement and the hand impairment were related. The scores of hand indices were lower in cluster B. The area under the receiver operating characteristic (ROC) curve (C-index) for the logistic model including all three indices was 0.85 (95% confidence interval 0.74–0.95). CONCLUSION The seriousness of hand involvement as measured by the three indices was associated with the extent of organ involvement. Further studies of hand impairment scales are needed to provide validated guidance as meaningful clinical measures.
Collapse
|
49
|
Enumerating the gene sets in breast cancer, a "direct" alternative to hierarchical clustering. BMC Genomics 2010; 11:482. [PMID: 20731868 PMCID: PMC2996978 DOI: 10.1186/1471-2164-11-482] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 08/23/2010] [Indexed: 11/10/2022] Open
Abstract
Background Two-way hierarchical clustering, with results visualized as heatmaps, has served as the method of choice for exploring structure in large matrices of expression data since the advent of microarrays. While it has delivered important insights, including a typology of breast cancer subtypes, it suffers from instability in the face of gene or sample selection, and an inability to detect small sets that may be dominated by larger sets such as the estrogen-related genes in breast cancer. The rank-based partitioning algorithm introduced in this paper addresses several of these limitations. It delivers results comparable to two-way hierarchical clustering, and much more. Applied systematically across a range of parameter settings, it enumerates all the partition-inducing gene sets in a matrix of expression values. Results Applied to four large breast cancer datasets, this alternative exploratory method detects more than thirty sets of co-regulated genes, many of which are conserved across experiments and across platforms. Many of these sets are readily identified in biological terms, e.g., "estrogen", "erbb2", and 8p11-12, and several are clinically significant as prognostic of either increased survival ("adipose", "stromal"...) or diminished survival ("proliferation", "immune/interferon", "histone",...). Of special interest are the sets that effectively factor "immune response" and "stromal signalling". Conclusion The gene sets induced by the enumeration include many of the sets reported in the literature. In this regard these inventories confirm and consolidate findings from microarray-based work on breast cancer over the last decade. But, the enumerations also identify gene sets that have not been studied as of yet, some of which are prognostic of survival. The sets induced are robust, biologically meaningful, and serve to reveal a finer structure in existing breast cancer microarrays.
Collapse
|
50
|
Abstract
Carcinomas may arise as a disorder of regeneration, so that a malignant cell may represent a failure to fully attain the characteristics of differentiated tissue. We hypothesized that there is a differential distribution of progenitor cell markers among different histological types of lung cancers, with poorly differentiated tumors being more likely to express progenitor stem cell markers. The study was limited to paraffin-embedded archival material of resected untreated pulmonary carcinomas, including adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and small cell carcinoma. The sections were stained for putative stem cells markers (Musashi-1, Musashi-2, CD34, CD21, KIT, CD133, p63, and OCT-4). Positivity was read as isolated, focal, or diffuse staining. Stem cell markers were detected in all histological types of pulmonary carcinomas. There was a difference in the expression of markers among the histological types. Small cell carcinoma showed diffuse positivity for most of the markers; in contrast to focal or negative staining in other histological groups. An inverse relationship between CD21 and Musashi-1 was observed. No staining for OCT-4 and CD34 was seen in any of the tumor types. Hierarchical clustering based on marker expression separated tumors into two groups, with one group marked by high expression of Musashi-1 and KIT, contained most of the poorly differentiated adenocarcinomas and small cell carcinomas. Therefore, stem cell markers are expressed in lung cancers with different patterns seen for different histological types and degrees of differentiation.
Collapse
|