1
|
Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022; 2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Maksim Sorokin
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Marianna Zolotovskaya
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Oncobox Ltd., Moscow, Russia
| | | | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia.,PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
2
|
Orench-Rivera N, Kuehn MJ. Differential Packaging Into Outer Membrane Vesicles Upon Oxidative Stress Reveals a General Mechanism for Cargo Selectivity. Front Microbiol 2021; 12:561863. [PMID: 34276573 PMCID: PMC8284480 DOI: 10.3389/fmicb.2021.561863] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 06/09/2021] [Indexed: 12/12/2022] Open
Abstract
Selective cargo packaging into bacterial extracellular vesicles has been reported and implicated in many biological processes, however, the mechanism behind the selectivity has remained largely unexplored. In this study, proteomic analysis of outer membrane (OM) and OM vesicle (OMV) fractions from enterotoxigenic E. coli revealed significant differences in protein abundance in the OMV and OM fractions for cultures shifted to oxidative stress conditions. Analysis of sequences of proteins preferentially packaged into OMVs showed that proteins with oxidizable residues were more packaged into OMVs in comparison with those retained in the membrane. In addition, the results indicated two distinct classes of OM-associated proteins were differentially packaged into OMVs as a function of peroxide treatment. Implementing a Bayesian hierarchical model, OM lipoproteins were determined to be preferentially exported during stress whereas integral OM proteins were preferentially retained in the cell. Selectivity was determined to be independent of transcriptional regulation of the proteins upon oxidative stress and was validated using randomly selected protein candidates from the different cargo classes. Based on these data, a hypothetical functional and mechanistic basis for cargo selectivity was tested using OmpA constructs. Our study reveals a basic mechanism for cargo selectivity into OMVs that may be useful for the engineering of OMVs for future biotechnological applications.
Collapse
Affiliation(s)
| | - Meta J. Kuehn
- Department of Biochemistry, Duke University Medical Center, Durham, NC, United States
| |
Collapse
|
3
|
Transcriptomic analysis of castration, chemo-resistant and metastatic prostate cancer elucidates complex genetic crosstalk leading to disease progression. Funct Integr Genomics 2021; 21:451-472. [PMID: 34184132 DOI: 10.1007/s10142-021-00789-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 06/05/2020] [Accepted: 05/06/2021] [Indexed: 12/22/2022]
Abstract
Prostate adenocarcinoma, with its rising numbers and high fatality rate, is a daunting healthcare challenge to clinicians and researchers alike. The mainstay of our meta-analysis was to decipher differentially expressed genes (DEGs), their corresponding transcription factors (TFs), miRNAs (microRNA) and interacting pathways underlying the progression of prostate cancer (PCa). We have chosen multiple datasets from primary, castration-resistant, chemo-resistant and metastatic prostate cancer stages for investigation. From our tissue-specific and disease-specific co-expression networks, fifteen hub genes such as ACTB, ACTN1, CDH1, CDKN1A, DDX21, ELF3, FLNA, FLNC, IKZF1, ILK, KRT13, KRT18, KRT19, SVIL and TRIM29 were identified and validated by molecular complex detection analysis as well as survival analysis. In our attempt to highlight hub gene-associated mutations and drug interactions, FLNC was found to be most commonly mutated and CDKN1A gene was found to have highest druggability. Moreover, from DAVID and gene set enrichment analysis, the focal adhesion and oestrogen signalling pathways were found enriched which indicates the involvement of hub genes in tumour invasiveness and metastasis. Finally by Enrichr tool and miRNet, we identified transcriptional factors SNAI2, TP63, CEBPB and KLF11 and microRNAs, namely hsa-mir-1-3p, hsa-mir-145-5p, hsa-mir-124-3p and hsa-mir-218-5p significantly controlling the hub gene expressions. In a nutshell, our report will help to gain a deeper insight into complex molecular intricacies and thereby unveil the probable biomarkers and therapeutic targets involved with PCa progression.
Collapse
|
4
|
Zhang S, Shao J, Yu D, Qiu X, Zhang J. MatchMixeR: a cross-platform normalization method for gene expression data integration. Bioinformatics 2020; 36:2486-2491. [PMID: 31904810 DOI: 10.1093/bioinformatics/btz974] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 09/19/2019] [Accepted: 12/31/2019] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Combining gene expression (GE) profiles generated from different platforms enables previously infeasible studies due to sample size limitations. Several cross-platform normalization methods have been developed to remove the systematic differences between platforms, but they may also remove meaningful biological differences among datasets. In this work, we propose a novel approach that removes the platform, not the biological differences. Dubbed as 'MatchMixeR', we model platform differences by a linear mixed effects regression (LMER) model, and estimate them from matched GE profiles of the same cell line or tissue measured on different platforms. The resulting model can then be used to remove platform differences in other datasets. By using LMER, we achieve better bias-variance trade-off in parameter estimation. We also design a computationally efficient algorithm based on the moment method, which is ideal for ultra-high-dimensional LMER analysis. RESULTS Compared with several prominent competing methods, MatchMixeR achieved the highest after-normalization concordance. Subsequent differential expression analyses based on datasets integrated from different platforms showed that using MatchMixeR achieved the best trade-off between true and false discoveries, and this advantage is more apparent in datasets with limited samples or unbalanced group proportions. AVAILABILITY AND IMPLEMENTATION Our method is implemented in a R-package, 'MatchMixeR', freely available at: https://github.com/dy16b/Cross-Platform-Normalization. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Serin Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Jiang Shao
- Gilead Sciences Inc., Foster City, CA 94404, USA
| | - Disa Yu
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14624, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| |
Collapse
|
5
|
Schmidt F, List M, Cukuroglu E, Köhler S, Göke J, Schulz MH. An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets. Bioinformatics 2019; 34:i908-i916. [PMID: 30423059 PMCID: PMC6129283 DOI: 10.1093/bioinformatics/bty553] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Motivation International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. Results We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here. Availability and implementation Our method is available online at https://github.com/SchulzLab/OntologyEval. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Schmidt
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Cluster of Excellence MMCI, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany.,Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany.,Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | - Markus List
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Engin Cukuroglu
- Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | | | - Jonathan Göke
- Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | - Marcel H Schulz
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Cluster of Excellence MMCI, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany.,Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner Site Rhein-Main, Frankfurt am Main, Germany
| |
Collapse
|
6
|
Buzdin A, Sorokin M, Garazha A, Glusker A, Aleshin A, Poddubskaya E, Sekacheva M, Kim E, Gaifullin N, Giese A, Seryakov A, Rumiantsev P, Moshkovskii S, Moiseev A. RNA sequencing for research and diagnostics in clinical oncology. Semin Cancer Biol 2019; 60:311-323. [PMID: 31412295 DOI: 10.1016/j.semcancer.2019.07.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 07/16/2019] [Indexed: 12/26/2022]
Abstract
Molecular diagnostics is becoming one of the major drivers of personalized oncology. With hundreds of different approved anticancer drugs and regimens of their administration, selecting the proper treatment for a patient is at least nontrivial task. This is especially sound for the cases of recurrent and metastatic cancers where the standard lines of therapy failed. Recent trials demonstrated that mutation assays have a strong limitation in personalized selection of therapeutics, consequently, most of the drugs cannot be ranked and only a small percentage of patients can benefit from the screening. Other approaches are, therefore, needed to address a problem of finding proper targeted therapies. The analysis of RNA expression (transcriptomic) profiles presents a reasonable solution because transcriptomics stands a few steps closer to tumor phenotype than the genome analysis. Several recent studies pioneered using transcriptomics for practical oncology and showed truly encouraging clinical results. The possibility of directly measuring of expression levels of molecular drugs' targets and profiling activation of the relevant molecular pathways enables personalized prioritizing for all types of molecular-targeted therapies. RNA sequencing is the most robust tool for the high throughput quantitative transcriptomics. Its use, potentials, and limitations for the clinical oncology will be reviewed here along with the technical aspects such as optimal types of biosamples, RNA sequencing profile normalization, quality controls and several levels of data analysis.
Collapse
Affiliation(s)
- Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | | | - Alex Aleshin
- Stanford University School of Medicine, Stanford, 94305, CA, USA
| | - Elena Poddubskaya
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Vitamed Oncological Clinics, Moscow, Russia
| | - Marina Sekacheva
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Johannes Gutenberg University Mainz, Mainz, Germany
| | - Nurshat Gaifullin
- Lomonosov Moscow State University, Faculty of Medicine, Moscow, Russia
| | | | | | | | - Sergey Moshkovskii
- Institute of Biomedical Chemistry, Moscow, 119121, Russia; Pirogov Russian National Research Medical University (RNRMU), Moscow, 117997, Russia
| | - Alexey Moiseev
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
7
|
Huo Z, Song C, Tseng G. BAYESIAN LATENT HIERARCHICAL MODEL FOR TRANSCRIPTOMIC META-ANALYSIS TO DETECT BIOMARKERS WITH CLUSTERED META-PATTERNS OF DIFFERENTIAL EXPRESSION SIGNALS. Ann Appl Stat 2019; 13:340-366. [PMID: 31007807 PMCID: PMC6472949 DOI: 10.1214/18-aoas1188] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Due to the rapid development of high-throughput experimental techniques and fast-dropping prices, many transcriptomic datasets have been generated and accumulated in the public domain. Meta-analysis combining multiple transcriptomic studies can increase the statistical power to detect disease-related biomarkers. In this paper, we introduce a Bayesian latent hierarchical model to perform transcriptomic meta-analysis. This method is capable of detecting genes that are differentially expressed (DE) in only a subset of the combined studies, and the latent variables help quantify homogeneous and heterogeneous differential expression signals across studies. A tight clustering algorithm is applied to detected biomarkers to capture differential meta-patterns that are informative to guide further biological investigation. Simulations and three examples, including a microarray dataset from metabolism-related knockout mice, an RNA-seq dataset from HIV transgenic rats, and cross-platform datasets from human breast cancer, are used to demonstrate the performance of the proposed method.
Collapse
Affiliation(s)
- Zhiguang Huo
- Department of Biostatistics University of Florida Gainesville, FL 32611
| | - Chi Song
- Division of Biostatistics College of Public Health The Ohio State University Columbus, OH 43210
| | - George Tseng
- Department of Biostatistics, Human Genetics and Computational Biology University of Pittsburgh Pittsburgh, PA 15261
| |
Collapse
|
8
|
Borisov N, Shabalina I, Tkachev V, Sorokin M, Garazha A, Pulin A, Eremin II, Buzdin A. Shambhala: a platform-agnostic data harmonizer for gene expression data. BMC Bioinformatics 2019; 20:66. [PMID: 30727942 PMCID: PMC6366102 DOI: 10.1186/s12859-019-2641-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 01/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms. Electronic supplementary material The online version of this article (10.1186/s12859-019-2641-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nicolas Borisov
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia. .,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.
| | - Irina Shabalina
- Faculty of Mathematics and Information Technologies, Petrozavodsk State University, Anokhina str., 20, Petrozavodsk, 185910, Russia
| | - Victor Tkachev
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| | - Andrew Garazha
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Laboratory of Bioinformatics, Oncology and Immunology, D. Rogachyov Federal Research Center of Pediatric Hematology, Moscow, 117198, Russia
| | - Andrey Pulin
- Laboratory for Cell Biology and Developmental Pathology, Federal State Institution "Institute of General Pathology and Pathophysiology", FSBSI "IGPP", Moscow, Russia
| | - Ilya I Eremin
- Department for Regenerative Medicine, JSC Generium, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| |
Collapse
|
9
|
Zabeck H, Dienemann H, Hoffmann H, Pfannschmidt J, Warth A, Schnabel PA, Muley T, Meister M, Sültmann H, Fröhlich H, Kuner R, Lasitschka F. Molecular signatures in IASLC/ATS/ERS classified growth patterns of lung adenocarcinoma. PLoS One 2018; 13:e0206132. [PMID: 30352093 PMCID: PMC6198952 DOI: 10.1371/journal.pone.0206132] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The current classification of human lung adenocarcinoma defines five different histological growth patterns within the group of conventional invasive adenocarcinomas. The five growth patterns are characterised by their typical architecture, but also by variable tumor biological behaviour. AIMS The aim of this study was to identify specific gene signatures of the five adenocarcinoma growth patterns defined by the joint IASLC/ATS/ERS working group. METHODS Total RNA from microdissected adenocarcinoma tissue samples of ten lepidic, ten acinar, ten solid, nine papillary, and nine micropapillary tumor portions was isolated and prepared for gene expression analysis. Differential expression of genes was determined using the R package "LIMMA". The overall significance of each signature was assessed via global test. Gene ontology statistics were analysed using GOstat. For immunohistochemical validation, tissue specimens from 20 tumors with solid and 20 tumors with lepidic growth pattern were used. RESULTS Microarray analyses between the growth patterns resulted in numerous differentially expressed genes between the solid architecture and other patterns. The comparison of transcriptomic activity in the solid and lepidic patterns revealed 705 up- and 110 downregulated non-redundant genes. The pattern-specific protein expression of Inositol-1,4,5-trisphosphate-kinase-A (ITPKA) and angiogenin by immunohistochemistry confirmed the RNA levels. The strongest differences in protein expression between the two patterns were shown for ITPKA (p = 0.02) and angiogenin (p = 0.113). CONCLUSIONS In this study growth pattern-specific gene signatures in pulmonary adenocarcinoma were identified and distinct transcriptomic differences between lung adenocarcinoma growth patterns were defined. The study provides valuable new information about pulmonary adenocarcinoma and allows a better assessment of the five adenocarcinoma subgroups.
Collapse
Affiliation(s)
- Heike Zabeck
- Department of Thoracic Surgery, Thoraxklinik, University Hospital Heidelberg, Heidelberg, Germany
| | - Hendrik Dienemann
- Department of Thoracic Surgery, Thoraxklinik, University Hospital Heidelberg, Heidelberg, Germany
| | - Hans Hoffmann
- Department of Thoracic Surgery, Thoraxklinik, University Hospital Heidelberg, Heidelberg, Germany
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
| | - Joachim Pfannschmidt
- Department of Thoracic Surgery, Thoraxklinik, University Hospital Heidelberg, Heidelberg, Germany
| | - Arne Warth
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | | | - Thomas Muley
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
- Translational Research Unit (STF), Thoraxklinik, University of Heidelberg, Heidelberg, Germany
| | - Michael Meister
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
- Translational Research Unit (STF), Thoraxklinik, University of Heidelberg, Heidelberg, Germany
| | - Holger Sültmann
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
- Cancer Genome Research (B063), German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Holger Fröhlich
- Institute for Computer Science, c/o Bonn-Aachen International Center for IT, Algorithmic Bioinformatics, University of Bonn, Bonn, Germany
| | - Ruprecht Kuner
- Translational Lung Research Centre Heidelberg (TLRC-H), German Centre for Lung Research (DZL), Heidelberg, Germany
- Cancer Genome Research (B063), German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Felix Lasitschka
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| |
Collapse
|
10
|
Long NP, Jung KH, Yoon SJ, Anh NH, Nghi TD, Kang YP, Yan HH, Min JE, Hong SS, Kwon SW. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers. Oncotarget 2017; 8:109436-109456. [PMID: 29312619 PMCID: PMC5752532 DOI: 10.18632/oncotarget.22689] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/27/2017] [Indexed: 12/18/2022] Open
Abstract
Although many outstanding achievements in the management of cervical cancer (CxCa) have obtained, it still imposes a major burden which has prompted scientists to discover and validate new CxCa biomarkers to improve the diagnostic and prognostic assessment of CxCa. In this study, eight different gene expression data sets containing 202 cancer, 115 cervical intraepithelial neoplasia (CIN), and 105 normal samples were utilized for an integrative systems biology assessment in a multi-stage carcinogenesis manner. Deep learning-based diagnostic models were established based on the genetic panels of intrinsic genes of cervical carcinogenesis as well as on the unbiased variable selection approach. Survival analysis was also conducted to explore the potential biomarker candidates for prognostic assessment. Our results showed that cell cycle, RNA transport, mRNA surveillance, and one carbon pool by folate were the key regulatory mechanisms involved in the initiation, progression, and metastasis of CxCa. Various genetic panels combined with machine learning algorithms successfully differentiated CxCa from CIN and normalcy in cross-study normalized data sets. In particular, the 168-gene deep learning model for the differentiation of cancer from normalcy achieved an externally validated accuracy of 97.96% (99.01% sensitivity and 95.65% specificity). Survival analysis revealed that ZNF281 and EPHB6 were the two most promising prognostic genetic markers for CxCa among others. Our findings open new opportunities to enhance current understanding of the characteristics of CxCa pathobiology. In addition, the combination of transcriptomics-based signatures and deep learning classification may become an important approach to improve CxCa diagnosis and management in clinical practice.
Collapse
Affiliation(s)
| | - Kyung Hee Jung
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Sang Jun Yoon
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Nguyen Hoang Anh
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam
| | - Tran Diem Nghi
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam
| | - Yun Pyo Kang
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Hong Hua Yan
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Jung Eun Min
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Soon-Sun Hong
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Sung Won Kwon
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
- Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
11
|
Controlling for Confounding Effects in Single Cell RNA Sequencing Studies Using both Control and Target Genes. Sci Rep 2017; 7:13587. [PMID: 29051597 PMCID: PMC5648789 DOI: 10.1038/s41598-017-13665-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 09/29/2017] [Indexed: 11/24/2022] Open
Abstract
Single cell RNA sequencing (scRNAseq) technique is becoming increasingly popular for unbiased and high-resolutional transcriptome analysis of heterogeneous cell populations. Despite its many advantages, scRNAseq, like any other genomic sequencing technique, is susceptible to the influence of confounding effects. Controlling for confounding effects in scRNAseq data is a crucial step for accurate downstream analysis. Here, we present a novel statistical method, which we refer to as scPLS (single cell partial least squares), for robust and accurate inference of confounding effects. scPLS takes advantage of the fact that genes in a scRNAseq study often can be naturally classified into two sets: a control set of genes that are free of effects of the predictor variables and a target set of genes that are of primary interest. By modeling the two sets of genes jointly using the partial least squares regression, scPLS is capable of making full use of the data to improve the inference of confounding effects. With extensive simulations and comparisons with other methods, we demonstrate the effectiveness of scPLS. Finally, we apply scPLS to analyze two scRNAseq data sets to illustrate its benefits in removing technical confounding effects as well as for removing cell cycle effects.
Collapse
|
12
|
Freytag S, Burgess R, Oliver KL, Bahlo M. brain-coX: investigating and visualising gene co-expression in seven human brain transcriptomic datasets. Genome Med 2017; 9:55. [PMID: 28595657 PMCID: PMC5465565 DOI: 10.1186/s13073-017-0444-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 05/26/2017] [Indexed: 12/17/2022] Open
Abstract
Background The pathogenesis of neurological and mental health disorders often involves multiple genes, complex interactions, as well as brain- and development-specific biological mechanisms. These characteristics make identification of disease genes for such disorders challenging, as conventional prioritisation tools are not specifically tailored to deal with the complexity of the human brain. Thus, we developed a novel web-application—brain-coX—that offers gene prioritisation with accompanying visualisations based on seven gene expression datasets in the post-mortem human brain, the largest such resource ever assembled. Results We tested whether our tool can correctly prioritise known genes from 37 brain-specific KEGG pathways and 17 psychiatric conditions. We achieved average sensitivity of nearly 50%, at the same time reaching a specificity of approximately 75%. We also compared brain-coX’s performance to that of its main competitors, Endeavour and ToppGene, focusing on the ability to discover novel associations. Using a subset of the curated SFARI autism gene collection we show that brain-coX’s prioritisations are most similar to SFARI’s own curated gene classifications. Conclusions brain-coX is the first prioritisation and visualisation web-tool targeted to the human brain and can be freely accessed via http://shiny.bioinf.wehi.edu.au/freytag.s/. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0444-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Freytag
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia. .,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.
| | - Rosemary Burgess
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Karen L Oliver
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Melanie Bahlo
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.,School of Mathematics and Statistics, University of Melbourne, 3010, Parkville, Australia
| |
Collapse
|
13
|
Ozerov IV, Lezhnina KV, Izumchenko E, Artemov AV, Medintsev S, Vanhaelen Q, Aliper A, Vijg J, Osipov AN, Labat I, West MD, Buzdin A, Cantor CR, Nikolsky Y, Borisov N, Irincheeva I, Khokhlovich E, Sidransky D, Camargo ML, Zhavoronkov A. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development. Nat Commun 2016; 7:13427. [PMID: 27848968 PMCID: PMC5116087 DOI: 10.1038/ncomms13427] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 10/03/2016] [Indexed: 01/02/2023] Open
Abstract
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Collapse
Affiliation(s)
- Ivan V Ozerov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Ksenia V Lezhnina
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Evgeny Izumchenko
- The Johns Hopkins University, School of Medicine, Department of Otolaryngology, Head and Neck Cancer Research, 1550 Orleans Street, Baltimore, Maryland 21231, USA
| | - Artem V Artemov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Sergey Medintsev
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Quentin Vanhaelen
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Alexander Aliper
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA
| | - Andreyan N Osipov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia
| | - Ivan Labat
- BioTime, Inc., 1010 Atlantic Avenue, Alameda, California 94501, USA
| | - Michael D West
- BioTime, Inc., 1010 Atlantic Avenue, Alameda, California 94501, USA
| | - Anton Buzdin
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,National Research Centre 'Kurchatov Institute', Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, 1, Akademika Kurchatova square, Moscow 123182, Russia
| | - Charles R Cantor
- Boston University, Department of Biomedical Engineering, 44 Cummington Street, Boston, Massachusetts 02215, USA
| | - Yuri Nikolsky
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Skolkovo Foundation, 5 Nobelya street, Skolkovo Innovation Centre, Mozhajskij region, Moscow 143026, Russia
| | - Nikolay Borisov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,National Research Centre 'Kurchatov Institute', Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, 1, Akademika Kurchatova square, Moscow 123182, Russia
| | - Irina Irincheeva
- Nutrition and Metabolic Health group, Nestlé Institute of Health Sciences, CH-1015 Lausanne, Switzerland
| | - Edward Khokhlovich
- Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - David Sidransky
- The Johns Hopkins University, School of Medicine, Department of Otolaryngology, Head and Neck Cancer Research, 1550 Orleans Street, Baltimore, Maryland 21231, USA
| | - Miguel Luiz Camargo
- Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Alex Zhavoronkov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,The Biogerontology Research Foundation, 2354 Chynoweth House, Trevissome Park, Truro TR4 8UN, UK
| |
Collapse
|
14
|
Ma C, Sastry KS, Flore M, Gehani S, Al-Bozom I, Feng Y, Serpedin E, Chouchane L, Chen Y, Huang Y. CrossLink: a novel method for cross-condition classification of cancer subtypes. BMC Genomics 2016; 17 Suppl 7:549. [PMID: 27556419 PMCID: PMC5001207 DOI: 10.1186/s12864-016-2903-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. METHODS To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. RESULTS We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. CONCLUSIONS A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
Collapse
Affiliation(s)
- Chifeng Ma
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, USA
| | - Konduru S Sastry
- Weill Cornell Medicine-Qatar, Doha, Qatar.,Division of Translational Medicine, Sidra Medical and Research Center, Doha, Qatar
| | - Mario Flore
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, USA
| | | | | | - Yusheng Feng
- Department of Mechanical Engineering, University of Texas at San Antonio, San Antonio, TX, USA
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | | | - Yidong Chen
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.,Greehey Children Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, USA. .,Greehey Children Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.
| |
Collapse
|
15
|
Müller C, Schillert A, Röthemeier C, Trégouët DA, Proust C, Binder H, Pfeiffer N, Beutel M, Lackner KJ, Schnabel RB, Tiret L, Wild PS, Blankenberg S, Zeller T, Ziegler A. Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data. PLoS One 2016; 11:e0156594. [PMID: 27272489 PMCID: PMC4896498 DOI: 10.1371/journal.pone.0156594] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 05/17/2016] [Indexed: 12/13/2022] Open
Abstract
Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.
Collapse
Affiliation(s)
- Christian Müller
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, 20246, Germany
- Institute of Medical Biometry and Statistics, University Medical Center Schleswig-Holstein, Campus Luebeck, Lübeck, 23562, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
| | - Arne Schillert
- Institute of Medical Biometry and Statistics, University Medical Center Schleswig-Holstein, Campus Luebeck, Lübeck, 23562, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
| | - Caroline Röthemeier
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, 20246, Germany
| | - David-Alexandre Trégouët
- Institut National de la Santé et de la Recherche Médicale (INSERM), Unité Mixte de Recherche(UMR) enSanté 1166, F-75013, Paris, France
- Institute for Cardiometabolism and Nutrition (ICAN), F-75013 Paris, France
- Sorbonne Universités, Université Pierre et Marie Curie (UPMC Univ Paris 06), UMR_S1166, Team Genomics & Pathophysiology of Cardiovascular Diseases, F-75013, Paris, France
| | - Carole Proust
- Institut National de la Santé et de la Recherche Médicale (INSERM), Unité Mixte de Recherche(UMR) enSanté 1166, F-75013, Paris, France
- Institute for Cardiometabolism and Nutrition (ICAN), F-75013 Paris, France
- Sorbonne Universités, Université Pierre et Marie Curie (UPMC Univ Paris 06), UMR_S1166, Team Genomics & Pathophysiology of Cardiovascular Diseases, F-75013, Paris, France
| | - Harald Binder
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) at the University Medical Center of the Johannes Gutenberg University Mainz, Mainz, 55131, Germany
| | - Norbert Pfeiffer
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center of the Johannes Gutenberg University, Mainz, 55131, Germany
| | - Manfred Beutel
- Department of Psychosomatic Medicine and Psychotherapy, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany
| | - Karl J. Lackner
- Institute for Clinical Chemistry and Laboratory Medicine, University Medical Center Mainz, Mainz, Germany
| | - Renate B. Schnabel
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, 20246, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
| | - Laurence Tiret
- Institut National de la Santé et de la Recherche Médicale (INSERM), Unité Mixte de Recherche(UMR) enSanté 1166, F-75013, Paris, France
- Institute for Cardiometabolism and Nutrition (ICAN), F-75013 Paris, France
- Sorbonne Universités, Université Pierre et Marie Curie (UPMC Univ Paris 06), UMR_S1166, Team Genomics & Pathophysiology of Cardiovascular Diseases, F-75013, Paris, France
| | - Philipp S. Wild
- Preventive Cardiology and Preventive Medicine, Center for Cardiology, University Medical Center Mainz, Mainz, 55131, Germany
- Center for Thrombosis and Hemostasis, University Medical Center Mainz, Mainz, 55131, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Rhine Main, Mainz, 55131, Germany
| | - Stefan Blankenberg
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, 20246, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
| | - Tanja Zeller
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, 20246, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
| | - Andreas Ziegler
- Institute of Medical Biometry and Statistics, University Medical Center Schleswig-Holstein, Campus Luebeck, Lübeck, 23562, Germany
- German Center for Cardiovascular Research (DZHK e.V.), partner site Hamburg, Lübeck, Kiel, 20246, Germany
- Center for Clinical Trials, University of Lübeck, Lübeck, 23562, Germany
- * E-mail:
| |
Collapse
|
16
|
Talhouk A, Kommoss S, Mackenzie R, Cheung M, Leung S, Chiu DS, Kalloger SE, Huntsman DG, Chen S, Intermaggio M, Gronwald J, Chan FC, Ramus SJ, Steidl C, Scott DW, Anglesio MS. Single-Patient Molecular Testing with NanoString nCounter Data Using a Reference-Based Strategy for Batch Effect Correction. PLoS One 2016; 11:e0153844. [PMID: 27096160 PMCID: PMC4838303 DOI: 10.1371/journal.pone.0153844] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 04/05/2016] [Indexed: 12/20/2022] Open
Abstract
A major weakness in many high-throughput genomic studies is the lack of consideration of a clinical environment where one patient at a time must be evaluated. We examined generalizable and platform-specific sources of variation from NanoString gene expression data on both ovarian cancer and Hodgkin lymphoma patients. A reference-based strategy, applicable to single-patient molecular testing is proposed for batch effect correction. The proposed protocol improved performance in an established Hodgkin lymphoma classifier, reducing batch-to-batch misclassification while retaining accuracy and precision. We suggest this strategy may facilitate development of NanoString and similar molecular assays by accelerating prospective validation and clinical uptake of relevant diagnostics.
Collapse
Affiliation(s)
- Aline Talhouk
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Stefan Kommoss
- Department of Women’s Health, University Hospital Tuebingen, Tuebingen, Germany
| | - Robertson Mackenzie
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Martin Cheung
- British Columbia Centre for Disease Control, Vancouver, Canada
| | - Samuel Leung
- Genetic Pathology Evaluation Centre (GPEC), Vancouver General Hospital and The University of British Columbia, Vancouver, Canada
| | - Derek S. Chiu
- Department of Statistics, University of British Columbia, Vancouver, Canada
| | - Steve E. Kalloger
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Pancreas Centre BC, Vancouver, Canada
| | - David G. Huntsman
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Genetic Pathology Evaluation Centre (GPEC), Vancouver General Hospital and The University of British Columbia, Vancouver, Canada
- Department of Obstetrics and Gynaecology, University of British Columbia, Vancouver, Canada
| | - Stephanie Chen
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, United States of America
| | - Maria Intermaggio
- School of Women's and Children's Health, University of New South Wales, Sydney, NSW, Australia
| | - Jacek Gronwald
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Fong C. Chan
- Centre for Lymphoid Cancer, British Columbia Cancer Agency Cancer Research Centre, Vancouver, Canada
| | - Susan J. Ramus
- School of Women's and Children's Health, University of New South Wales, Sydney, NSW, Australia
- The Kinghorn Cancer Centre, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Christian Steidl
- Centre for Lymphoid Cancer, British Columbia Cancer Agency Cancer Research Centre, Vancouver, Canada
| | - David W. Scott
- Centre for Lymphoid Cancer, British Columbia Cancer Agency Cancer Research Centre, Vancouver, Canada
| | - Michael S. Anglesio
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Obstetrics and Gynaecology, University of British Columbia, Vancouver, Canada
- * E-mail:
| |
Collapse
|
17
|
Comparative Analysis of Matrix Metalloproteinase Family Members Reveals That MMP9 Predicts Survival and Response to Temozolomide in Patients with Primary Glioblastoma. PLoS One 2016; 11:e0151815. [PMID: 27022952 PMCID: PMC4811585 DOI: 10.1371/journal.pone.0151815] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 03/04/2016] [Indexed: 12/22/2022] Open
Abstract
Background Glioblastoma multiform (GBM) is the most common malignant primary brain tumor in adults. Radiotherapy plus concomitant and adjuvant TMZ chemotherapy is the current standard of care for patients with GBM. Matrix metalloproteinases (MMPs), a family of zinc-dependent endopeptidases, are key modulators of tumor invasion and metastasis due to their ECM degradation capacity. The aim of the present study was to identify the most informative MMP member in terms of prognostic and predictive ability for patients with primary GBM. Method The mRNA expression profiles of all MMP genes were obtained from the Chinese Glioma Genome Atlas (CGGA), the Repository for Molecular Brain Neoplasia Data (REMBRANDT) and the GSE16011 dataset. MGMT methylation status was also examined by pyrosequencing. The correlation of MMP9 expression with tumor progression was explored in glioma specimens of all grades. Kaplan–Meier analysis and Cox proportional hazards regression models were used to investigate the association of MMP9 expression with survival and response to temozolomide. Results MMP9 was the only significant prognostic factor in three datasets for primary glioblastoma patients. Our results indicated that MMP9 expression is correlated with glioma grade (p<0.0001). Additionally, low expression of MMP9 was correlated with better survival outcome (OS: p = 0.0012 and PFS: p = 0.0066), and MMP9 was an independent prognostic factor in primary GBM (OS: p = 0.027 and PFS: p = 0.032). Additionally, the GBM patients with low MMP9 expression benefited from temozolomide (TMZ) chemotherapy regardless of the MGMT methylation status. Conclusions Patients with primary GBMs with low MMP9 expression may have longer survival and may benefit from temozolomide chemotherapy.
Collapse
|
18
|
Golf O, Muirhead LJ, Speller A, Balog J, Abbassi-Ghadi N, Kumar S, Mróz A, Veselkov K, Takáts Z. XMS: cross-platform normalization method for multimodal mass spectrometric tissue profiling. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:44-54. [PMID: 25380777 DOI: 10.1007/s13361-014-0997-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Revised: 09/01/2014] [Accepted: 09/02/2014] [Indexed: 06/04/2023]
Abstract
Here we present a proof of concept cross-platform normalization approach to convert raw mass spectra acquired by distinct desorption ionization methods and/or instrumental setups to cross-platform normalized analyte profiles. The initial step of the workflow is database driven peak annotation followed by summarization of peak intensities of different ions from the same molecule. The resulting compound-intensity spectra are adjusted to a method-independent intensity scale by using predetermined, compound-specific normalization factors. The method is based on the assumption that distinct MS-based platforms capture a similar set of chemical species in a biological sample, though these species may exhibit platform-specific molecular ion intensity distribution patterns. The method was validated on two sample sets of (1) porcine tissue analyzed by laser desorption ionization (LDI), desorption electrospray ionization (DESI), and rapid evaporative ionization mass spectrometric (REIMS) in combination with Fourier transformation-based mass spectrometry; and (2) healthy/cancerous colorectal tissue analyzed by DESI and REIMS with the latter being combined with time-of-flight mass spectrometry. We demonstrate the capacity of our method to reduce MS-platform specific variation resulting in (1) high inter-platform concordance coefficients of analyte intensities; (2) clear principal component based clustering of analyte profiles according to histological tissue types, irrespective of the used desorption ionization technique or mass spectrometer; and (3) accurate "blind" classification of histologic tissue types using cross-platform normalized analyte profiles.
Collapse
Affiliation(s)
- Ottmar Golf
- Institute for Inorganic and Analytical Chemistry, Justus Liebig University, Giessen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Parker HS, Corrada Bravo H, Leek JT. Removing batch effects for prediction problems with frozen surrogate variable analysis. PeerJ 2014; 2:e561. [PMID: 25332844 PMCID: PMC4179553 DOI: 10.7717/peerj.561] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 08/15/2014] [Indexed: 01/06/2023] Open
Abstract
Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
Collapse
Affiliation(s)
- Hilary S. Parker
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Héctor Corrada Bravo
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Jeffrey T. Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
20
|
Cox C, Sharp FR. RNA-based blood genomics as an investigative tool and prospective biomarker for ischemic stroke. Neurol Res 2013; 35:457-64. [DOI: 10.1179/1743132813y.0000000212] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
21
|
Whitmore SS, Braun TA, Skeie JM, Haas CM, Sohn EH, Stone EM, Scheetz TE, Mullins RF. Altered gene expression in dry age-related macular degeneration suggests early loss of choroidal endothelial cells. Mol Vis 2013; 19:2274-97. [PMID: 24265543 PMCID: PMC3834599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 11/14/2013] [Indexed: 10/26/2022] Open
Abstract
PURPOSE Age-related macular degeneration (AMD) is a major cause of blindness in developed countries. The molecular pathogenesis of early events in AMD is poorly understood. We investigated differential gene expression in samples of human retinal pigment epithelium (RPE) and choroid from early AMD and control maculas with exon-based arrays. METHODS Gene expression levels in nine human donor eyes with early AMD and nine control human donor eyes were assessed using Affymetrix Human Exon ST 1.0 arrays. Two controls did not pass quality control and were removed. Differentially expressed genes were annotated using the Database for Annotation, Visualization and Integrated Discovery (DAVID), and gene set enrichment analysis (GSEA) was performed on RPE-specific and endothelium-associated gene sets. The complement factor H (CFH) genotype was also assessed, and differential expression was analyzed regarding high AMD risk (YH/HH) and low AMD risk (YY) genotypes. RESULTS Seventy-five genes were identified as differentially expressed (raw p value <0.01; ≥50% fold change, mean log2 expression level in AMD or control ≥ median of all average gene expression values); however, no genes were significant (adj. p value <0.01) after correction for multiple hypothesis testing. Of 52 genes with decreased expression in AMD (fold change <0.5; raw p value <0.01), 18 genes were identified by DAVID analysis as associated with vision or neurologic processes. The GSEA of the RPE-associated and endothelium-associated genes revealed a significant decrease in genes typically expressed by endothelial cells in the early AMD group compared to controls, consistent with previous histologic and proteomic studies. Analysis of the CFH genotype indicated decreased expression of ADAMTS9 in eyes with high-risk genotypes (fold change = -2.61; raw p value=0.0008). CONCLUSIONS GSEA results suggest that RPE transcripts are preserved or elevated in early AMD, concomitant with loss of endothelial cell marker expression. These results are consistent with the notion that choroidal endothelial cell dropout or dedifferentiation occurs early in the pathogenesis of AMD.
Collapse
Affiliation(s)
- S. Scott Whitmore
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| | - Terry A. Braun
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA,Department of Biomedical Engineering, The University of Iowa, Iowa City, IA
| | - Jessica M. Skeie
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| | - Christine M. Haas
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| | - Elliott H. Sohn
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| | - Edwin M. Stone
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| | - Todd E. Scheetz
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA,Department of Biomedical Engineering, The University of Iowa, Iowa City, IA
| | - Robert F. Mullins
- Department of Ophthalmology and Visual Sciences, The University of Iowa, Iowa City, IA,Stephen A. Wynn Institute for Vision Research, The University of Iowa, Iowa City, IA
| |
Collapse
|
22
|
Lee J, Lee S. Cross Platform Data Analysis in Microarray Experiment. KOREAN JOURNAL OF APPLIED STATISTICS 2013. [DOI: 10.5351/kjas.2013.26.2.307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
23
|
Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, Weiss-Solís DY, Duque R, Bersini H, Nowé A. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform 2012; 14:469-90. [PMID: 22851511 DOI: 10.1093/bib/bbs037] [Citation(s) in RCA: 205] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.
Collapse
Affiliation(s)
- Cosmin Lazar
- Como, Vrije Universiteit Brussel, Pleinlaanz, 1050 Brussels, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Gregori J, Villarreal L, Méndez O, Sánchez A, Baselga J, Villanueva J. Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. J Proteomics 2012; 75:3938-51. [PMID: 22588121 DOI: 10.1016/j.jprot.2012.05.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Revised: 04/27/2012] [Accepted: 05/02/2012] [Indexed: 02/04/2023]
Abstract
Shotgun proteomics has become the standard proteomics technique for the large-scale measurement of protein abundances in biological samples. Despite quantitative proteomics has been usually performed using label-based approaches, label-free quantitation offers advantages related to the avoidance of labeling steps, no limitation in the number of samples to be compared, and the gain in protein detection sensitivity. However, since samples are analyzed separately, experimental design becomes critical. The exploration of spectral counting quantitation based on LC-MS presented here gathers experimental evidence of the influence of batch effects on comparative proteomics. The batch effects shown with spiking experiments clearly interfere with the biological signal. In order to minimize the interferences from batch effects, a statistical correction is proposed and implemented. Our results show that batch effects can be attenuated statistically when proper experimental design is used. Furthermore, the batch effect correction implemented leads to a substantial increase in the sensitivity of statistical tests. Finally, the applicability of our batch effects correction is shown on two different biomarker discovery projects involving cancer secretomes. We think that our findings will allow designing and executing better comparative proteomics projects and will help to avoid reaching false conclusions in the field of proteomics biomarker discovery.
Collapse
Affiliation(s)
- Josep Gregori
- Vall d'Hebron Institut of Oncology, Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
25
|
Rudy J, Valafar F. Empirical comparison of cross-platform normalization methods for gene expression data. BMC Bioinformatics 2011; 12:467. [PMID: 22151536 PMCID: PMC3314675 DOI: 10.1186/1471-2105-12-467] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 12/07/2011] [Indexed: 12/13/2022] Open
Abstract
Background Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow. Results Using two publicly available cross-platform testing data sets, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data. Scatter and ROC-like plots are produced and new statistics based on those plots are introduced to measure the effectiveness of each method. Bootstrapping is employed to obtain distributions for those statistics. The consistency of platform effects across studies is explored theoretically and with respect to the testing data sets. Conclusions Our comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. We provide an R package, CONOR, capable of performing the nine cross-platform normalization methods considered. The package can be downloaded at http://alborz.sdsu.edu/conor and is available from CRAN.
Collapse
Affiliation(s)
- Jason Rudy
- Biomedical Informatics Research Center, San Diego State University, 5500 Campanile Dr, San Diego, CA, USA
| | | |
Collapse
|
26
|
Li X, LeBlanc J, Truong A, Vuthoori R, Chen SS, Lustgarten JL, Roth B, Allard J, Ippoliti A, Presley LL, Borneman J, Bigbee WL, Gopalakrishnan V, Graeber TG, Elashoff D, Braun J, Goodglick L. A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface. PLoS One 2011; 6:e26542. [PMID: 22132074 PMCID: PMC3221670 DOI: 10.1371/journal.pone.0026542] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Accepted: 09/28/2011] [Indexed: 01/03/2023] Open
Abstract
Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI.
Collapse
Affiliation(s)
- Xiaoxiao Li
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - James LeBlanc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Allison Truong
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Ravi Vuthoori
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Sharon S. Chen
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Jonathan L. Lustgarten
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Bennett Roth
- Department of Medicine, Division of Digestive Disease, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Jeff Allard
- Department of Medicine, Division of Digestive Disease, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Andrew Ippoliti
- Inflammatory Bowel Disease Center, Cedars-Sinai Medical Center, Los Angeles, California, United States of America
| | - Laura L. Presley
- Department of Plant Pathology and Microbiology, University of California Riverside, Riverside, California, United States of America
| | - James Borneman
- Department of Plant Pathology and Microbiology, University of California Riverside, Riverside, California, United States of America
| | - William L. Bigbee
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- University of Pittsburgh Cancer Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Thomas G. Graeber
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - David Elashoff
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Department of Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Department of Biostatistics, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
| | - Jonathan Braun
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail: (JB); (LG)
| | - Lee Goodglick
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail: (JB); (LG)
| |
Collapse
|
27
|
Magni P, Simeone A, Healy S, Isacchi A, Bosotti R. Summarizing probe intensities of affymetrix GeneChip 3' expression arrays taking into account day-to-day variability. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1425-1430. [PMID: 21778528 DOI: 10.1109/tcbb.2010.82] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Microarray experiments are affected by several sources of variability. The paper demonstrates the major role of the day-to-day variability, it underlines the importance of a randomized block design when processing replicates over several days to avoid systematic biases and it proposes a simple algorithm that minimizes the day dependence.
Collapse
|
28
|
Abstract
Whole genome expression microarrays can be used to study gene expression in blood, which comes in part from leukocytes, immature platelets, and red blood cells. Since these cells are important in the pathogenesis of stroke, RNA provides an index of these cellular responses to stroke. Our studies in rats have shown specific gene expression changes 24 hours after ischemic stroke, hemorrhage, status epilepticus, hypoxia, hypoglycemia, global ischemia, and following brief focal ischemia that simulated transient ischemic attacks in humans. Human studies show gene expression changes following ischemic stroke. These gene profiles predict a second cohort with >90% sensitivity and specificity. Gene profiles for ischemic stroke caused by large-vessel atherosclerosis and cardioembolism have been described that predict a second cohort with >85% sensitivity and specificity. Atherosclerotic genes were associated with clotting, platelets, and monocytes, and cardioembolic genes were associated with inflammation, infection, and neutrophils. These gene profiles predicted the cause of stroke in 58% of cryptogenic patients. These studies will provide diagnostic, prognostic, and therapeutic markers, and will advance our understanding of stroke in humans. New techniques to measure all coding and noncoding RNAs along with alternatively spliced transcripts will markedly advance molecular studies of human stroke.
Collapse
|
29
|
Mendrick DL. Transcriptional profiling to identify biomarkers of disease and drug response. Pharmacogenomics 2011; 12:235-49. [PMID: 21332316 DOI: 10.2217/pgs.10.184] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The discovery, biological qualification and analytical validation of genomic biomarkers requires extensive collaborations between individuals with expertise in biology, statistics, bioinformatics, chemistry, clinical medicine, regulatory science and so on. For clinical utility, blood-borne biomarkers (e.g., mRNA and miRNA) of organ damage, drug toxicity and/or response would be preferred to those that are tissue based. Currently used biomarkers such as serum creatinine (indicating renal dysfunction) denote organ damage whether caused by disease, physical injury or drugs. Therefore, it is anticipated that studies of disease will discover biomarkers that can also be used to identify drug-induced injury and vice versa. This article describes transcriptomic blood-borne biomarkers that have been reported to be connected with disease and drug toxicity. Much more qualification and validation needs to be carried out before many of these biomarkers can prove useful. Discussed here are some of the lessons learned and roadblocks to success.
Collapse
Affiliation(s)
- Donna L Mendrick
- Division of Systems Biology, HFT-230, National Center for Toxicological Research, US FDA, 3900 NCTR Rd, Jefferson, AR 72079-4502, USA.
| |
Collapse
|
30
|
de Groot JF, Lamborn KR, Chang SM, Gilbert MR, Cloughesy TF, Aldape K, Yao J, Jackson EF, Lieberman F, Robins HI, Mehta MP, Lassman AB, Deangelis LM, Yung WKA, Chen A, Prados MD, Wen PY. Phase II study of aflibercept in recurrent malignant glioma: a North American Brain Tumor Consortium study. J Clin Oncol 2011; 29:2689-95. [PMID: 21606416 DOI: 10.1200/jco.2010.34.1636] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
PURPOSE Antivascular endothelial growth factor (anti-VEGF) therapy is a promising treatment approach for patients with recurrent glioblastoma. This single-arm phase II study evaluated the efficacy of aflibercept (VEGF Trap), a recombinantly produced fusion protein that scavenges both VEGF and placental growth factor in patients with recurrent malignant glioma. PATIENTS AND METHODS Forty-two patients with glioblastoma and 16 patients with anaplastic glioma who had received concurrent radiation and temozolomide and adjuvant temozolomide were enrolled at first relapse. Aflibercept 4 mg/kg was administered intravenously on day 1 of every 2-week cycle. RESULTS The 6-month progression-free survival rate was 7.7% for the glioblastoma cohort and 25% for patients with anaplastic glioma. Overall radiographic response rate was 24% (18% for glioblastoma and 44% for anaplastic glioma). The median progression-free survival was 24 weeks for patients with anaplastic glioma (95% CI, 5 to 31 weeks) and 12 weeks for patients with glioblastoma (95% CI, 8 to 16 weeks). A total of 14 patients (25%) were removed from the study for toxicity, on average less than 2 months from treatment initiation. The main treatment-related National Cancer Institute Common Terminology Criteria grades 3 and 4 adverse events (38 total) included fatigue, hypertension, and lymphopenia. Two grade 4 CNS ischemias and one grade 4 systemic hemorrhage were reported. Aflibercept rapidly decreases permeability on dynamic contrast enhanced magnetic resonance imaging, and molecular analysis of baseline tumor tissue identified tumor-associated markers of response and resistance. CONCLUSION Aflibercept monotherapy has moderate toxicity and minimal evidence of single-agent activity in unselected patients with recurrent malignant glioma.
Collapse
Affiliation(s)
- John F de Groot
- The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Cacciottolo M, Belcastro V, Laval S, Bushby K, di Bernardo D, Nigro V. Reverse engineering gene network identifies new dysferlin-interacting proteins. J Biol Chem 2011; 286:5404-13. [PMID: 21119217 PMCID: PMC3037653 DOI: 10.1074/jbc.m110.173559] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 11/29/2010] [Indexed: 01/28/2023] Open
Abstract
Dysferlin (DYSF) is a type II transmembrane protein implicated in surface membrane repair of muscle. Mutations in dysferlin lead to Limb Girdle Muscular Dystrophy 2B (LGMD2B), Miyoshi Myopathy (MM), and Distal Myopathy with Anterior Tibialis onset (DMAT). The DYSF protein complex is not well understood, and only a few protein-binding partners have been identified thus far. To increase the set of interacting protein partners for DYSF we recovered a list of predicted interacting protein through a systems biology approach. The predictions are part of a "reverse-engineered" genome-wide human gene regulatory network obtained from experimental data by computational analysis. The reverse-engineering algorithm behind the analysis relates genes to each other based on changes in their expression patterns. DYSF and AHNAK were used to query the system and extract lists of potential interacting proteins. Among the 32 predictions the two genes share, we validated the physical interaction between DYSF protein with moesin (MSN) and polymerase I and transcript release factor (PTRF) in mouse heart lysate, thus identifying two novel Dysferlin-interacting proteins. Our strategy could be useful to clarify Dysferlin function in intracellular vesicles and its implication in muscle membrane resealing.
Collapse
Affiliation(s)
- Mafalda Cacciottolo
- From the TIGEM-Telethon Institute of Genetics and Medicine, 80131 Naples, Italy
| | - Vincenzo Belcastro
- From the TIGEM-Telethon Institute of Genetics and Medicine, 80131 Naples, Italy
| | - Steve Laval
- the Institute of Human Genetics, Newcastle University, NE1 3BZ Newcastle Upon Tyne, United Kingdom, and
| | - Kate Bushby
- the Institute of Human Genetics, Newcastle University, NE1 3BZ Newcastle Upon Tyne, United Kingdom, and
| | - Diego di Bernardo
- From the TIGEM-Telethon Institute of Genetics and Medicine, 80131 Naples, Italy
| | - Vincenzo Nigro
- From the TIGEM-Telethon Institute of Genetics and Medicine, 80131 Naples, Italy
- the Laboratorio di Genetica Medica, Dipartimento di Patologia Generale and CIRM, Seconda Università degli Studi di Napoli, 80138 Naples, Italy
| |
Collapse
|
32
|
Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. THE PHARMACOGENOMICS JOURNAL 2010; 10:278-91. [PMID: 20676067 PMCID: PMC2920074 DOI: 10.1038/tpj.2010.57] [Citation(s) in RCA: 192] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.
Collapse
Affiliation(s)
- J Luo
- Systems Analytics Inc., Waltham, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles. BMC Genomics 2010; 11:134. [PMID: 20181233 PMCID: PMC2843619 DOI: 10.1186/1471-2164-11-134] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2009] [Accepted: 02/24/2010] [Indexed: 11/25/2022] Open
Abstract
Background Microarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study. Results A clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%. Conclusion In the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.
Collapse
|
34
|
Glaab E, Garibaldi JM, Krasnogor N. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization. BMC Bioinformatics 2009; 10:358. [PMID: 19863798 PMCID: PMC2776026 DOI: 10.1186/1471-2105-10-358] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 10/28/2009] [Indexed: 01/21/2023] Open
Abstract
Background Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.
Collapse
Affiliation(s)
- Enrico Glaab
- School of Computer Science, Nottingham University, Jubilee Campus, Wollaton Road, Nottingham, UK.
| | | | | |
Collapse
|
35
|
Stamova BS, Apperson M, Walker WL, Tian Y, Xu H, Adamczy P, Zhan X, Liu DZ, Ander BP, Liao IH, Gregg JP, Turner RJ, Jickling G, Lit L, Sharp FR. Identification and validation of suitable endogenous reference genes for gene expression studies in human peripheral blood. BMC Med Genomics 2009; 2:49. [PMID: 19656400 PMCID: PMC2736983 DOI: 10.1186/1755-8794-2-49] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 08/05/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression studies require appropriate normalization methods. One such method uses stably expressed reference genes. Since suitable reference genes appear to be unique for each tissue, we have identified an optimal set of the most stably expressed genes in human blood that can be used for normalization. METHODS Whole-genome Affymetrix Human 2.0 Plus arrays were examined from 526 samples of males and females ages 2 to 78, including control subjects and patients with Tourette syndrome, stroke, migraine, muscular dystrophy, and autism. The top 100 most stably expressed genes with a broad range of expression levels were identified. To validate the best candidate genes, we performed quantitative RT-PCR on a subset of 10 genes (TRAP1, DECR1, FPGS, FARP1, MAPRE2, PEX16, GINS2, CRY2, CSNK1G2 and A4GALT), 4 commonly employed reference genes (GAPDH, ACTB, B2M and HMBS) and PPIB, previously reported to be stably expressed in blood. Expression stability and ranking analysis were performed using GeNorm and NormFinder algorithms. RESULTS Reference genes were ranked based on their expression stability and the minimum number of genes needed for nomalization as calculated using GeNorm showed that the fewest, most stably expressed genes needed for acurate normalization in RNA expression studies of human whole blood is a combination of TRAP1, FPGS, DECR1 and PPIB. We confirmed the ranking of the best candidate control genes by using an alternative algorithm (NormFinder). CONCLUSION The reference genes identified in this study are stably expressed in whole blood of humans of both genders with multiple disease conditions and ages 2 to 78. Importantly, they also have different functions within cells and thus should be expressed independently of each other. These genes should be useful as normalization genes for microarray and RT-PCR whole blood studies of human physiology, metabolism and disease.
Collapse
Affiliation(s)
- Boryana S Stamova
- Department of Neurology and M,I,N,D, Institute, University of California at Davis Medical Center, Sacramento, CA 95817, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Gene expression in blood of subjects with Duchenne muscular dystrophy. Neurogenetics 2008; 10:117-25. [DOI: 10.1007/s10048-008-0167-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 11/26/2008] [Indexed: 10/21/2022]
|