1
|
Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023; 10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp, Walnut, CA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | - Alexander Simonov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Clinic for Neurosurgery, Laboratory of Experimental Neurooncology, Johannes Gutenberg University Medical Centre, Mainz, Germany
| | - Denis Kuzmin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Betul Karademir-Yilmaz
- Department of Biochemistry, School of Medicine/Genetic and Metabolic Diseases Research and Investigation Center (GEMHAM) Marmara University, Istanbul, Türkiye
| | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
2
|
Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022; 10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.
Collapse
Affiliation(s)
- Nicolas Borisov
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Anton Buzdin
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
| |
Collapse
|
3
|
Huang HH, Rao H, Miao R, Liang Y. A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression. BMC Bioinformatics 2022; 23:353. [PMID: 35999505 PMCID: PMC9396780 DOI: 10.1186/s12859-022-04887-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 12/22/2022] Open
Abstract
Background Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. Results Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${{\varvec{L}}}_{\mathbf{2}}$$\end{document}L2). The DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{L}}_{\mathbf{2}}$$\end{document}L2 method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. Conclusion We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets.
Collapse
Affiliation(s)
- Hai-Hui Huang
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Hao Rao
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Rui Miao
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- The Peng Cheng Laboratory, Shenzhen, China.
| |
Collapse
|
4
|
Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022; 2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Maksim Sorokin
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Marianna Zolotovskaya
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Oncobox Ltd., Moscow, Russia
| | | | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia.,PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
5
|
Borisov N, Sorokin M, Garazha A, Buzdin A. Quantitation of Molecular Pathway Activation Using RNA Sequencing Data. Methods Mol Biol 2020; 2063:189-206. [PMID: 31667772 DOI: 10.1007/978-1-0716-0138-9_15] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Intracellular molecular pathways (IMPs) control all major events in the living cell. IMPs are considered hotspots in biomedical sciences and thousands of IMPs have been discovered for humans and model organisms. Knowledge of IMPs activation is essential for understanding biological functions and differences between the biological objects at the molecular level. Here we describe the Oncobox system for accurate quantitative scoring activities of up to several thousand molecular pathways based on high throughput molecular data. Although initially designed for gene expression and mainly RNA sequencing data, Oncobox is now also applicable for quantitative proteomics, microRNA and transcription factor binding sites mapping data. The Oncobox system includes modules of gene expression data harmonization, aggregation and comparison and a recursive algorithm for automatic annotation of molecular pathways. The universal rationale of Oncobox enables scoring of signaling, metabolic, cytoskeleton, immunity, DNA repair, and other pathways in a multitude of biological objects. The Oncobox system can be helpful to all those working in the fields of genetics, biochemistry, interactomics, and big data analytics in molecular biomedicine.
Collapse
Affiliation(s)
- Nicolas Borisov
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Omicsway Corp., Walnut, CA, USA
| | - Maxim Sorokin
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Omicsway Corp., Walnut, CA, USA
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Anton Buzdin
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.
- Omicsway Corp., Walnut, CA, USA.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| |
Collapse
|
6
|
Buzdin A, Sorokin M, Garazha A, Glusker A, Aleshin A, Poddubskaya E, Sekacheva M, Kim E, Gaifullin N, Giese A, Seryakov A, Rumiantsev P, Moshkovskii S, Moiseev A. RNA sequencing for research and diagnostics in clinical oncology. Semin Cancer Biol 2019; 60:311-323. [PMID: 31412295 DOI: 10.1016/j.semcancer.2019.07.010] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 07/16/2019] [Indexed: 12/26/2022]
Abstract
Molecular diagnostics is becoming one of the major drivers of personalized oncology. With hundreds of different approved anticancer drugs and regimens of their administration, selecting the proper treatment for a patient is at least nontrivial task. This is especially sound for the cases of recurrent and metastatic cancers where the standard lines of therapy failed. Recent trials demonstrated that mutation assays have a strong limitation in personalized selection of therapeutics, consequently, most of the drugs cannot be ranked and only a small percentage of patients can benefit from the screening. Other approaches are, therefore, needed to address a problem of finding proper targeted therapies. The analysis of RNA expression (transcriptomic) profiles presents a reasonable solution because transcriptomics stands a few steps closer to tumor phenotype than the genome analysis. Several recent studies pioneered using transcriptomics for practical oncology and showed truly encouraging clinical results. The possibility of directly measuring of expression levels of molecular drugs' targets and profiling activation of the relevant molecular pathways enables personalized prioritizing for all types of molecular-targeted therapies. RNA sequencing is the most robust tool for the high throughput quantitative transcriptomics. Its use, potentials, and limitations for the clinical oncology will be reviewed here along with the technical aspects such as optimal types of biosamples, RNA sequencing profile normalization, quality controls and several levels of data analysis.
Collapse
Affiliation(s)
- Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | | | - Alex Aleshin
- Stanford University School of Medicine, Stanford, 94305, CA, USA
| | - Elena Poddubskaya
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Vitamed Oncological Clinics, Moscow, Russia
| | - Marina Sekacheva
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Johannes Gutenberg University Mainz, Mainz, Germany
| | - Nurshat Gaifullin
- Lomonosov Moscow State University, Faculty of Medicine, Moscow, Russia
| | | | | | | | - Sergey Moshkovskii
- Institute of Biomedical Chemistry, Moscow, 119121, Russia; Pirogov Russian National Research Medical University (RNRMU), Moscow, 117997, Russia
| | - Alexey Moiseev
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
7
|
Malatras A, Duguez S, Duddy W. Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field. Skelet Muscle 2019; 9:10. [PMID: 31053169 PMCID: PMC6498474 DOI: 10.1186/s13395-019-0196-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 04/09/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The approach of building large collections of gene sets and then systematically testing hypotheses across these collections is a powerful tool in functional genomics, both in the pathway analysis of omics data and to uncover the polygenic effects associated with complex diseases in genome-wide association study. The Molecular Signatures Database includes collections of oncogenic and immunologic signatures enabling researchers to compare transcriptional datasets across hundreds of previous studies and leading to important insights in these fields, but such a resource does not currently exist for neuromuscular research. In previous work, we have shown the utility of gene set approaches to understand muscle cell physiology and pathology. METHODS Following a systematic survey of public muscle data, we passed gene expression profiles from 4305 samples through a robust pre-processing and standardized data analysis pipeline. Two hundred eighty-two samples were discarded based on a battery of rigorous global quality controls. From among the remaining studies, 578 comparisons of interest were identified by a combination of text mining and manual curation of the study meta-data. For each comparison, significantly dysregulated genes (FDR adjusted p < 0.05) were identified. RESULTS Lists of dysregulated genes were divided between upregulated and downregulated to give 1156 Muscle Gene Sets (MGS). This resource is available for download ( www.sys-myo.com/muscle_gene_sets ) and is accessible through three commonly used functional genomics platforms (GSEA, EnrichR, and WebGestalt). Basic guidance and recommendations are provided for the use of MGS through these platforms. In addition, consensus muscle gene sets were created to capture the overlap between the results of similar studies, and analysis of these highlighted the potential for novel disease-relevant findings. CONCLUSIONS The MGS resource can be used to investigate the behaviour of any list of genes across previous comparisons of muscle conditions, to compare previous studies to one another, and to explore the functional relationship of muscle dysregulation to the Gene Ontology. Its major intended use is in enrichment testing for functional genomics analysis.
Collapse
Affiliation(s)
- Apostolos Malatras
- Myologie Centre de Recherche, Université Sorbonne, UMRS 974 UPMC, INSERM, FRE 3617 CNRS, AIM, Paris, France
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, C-TRIC, Ulster University, Altnagelvin Hospital Campus, Glenshane Road, Derry/Londonderry, BT47 6SB UK
- Department of Biological Sciences, Molecular Medicine Research Center, University of Cyprus, 1 University Avenue, 2109 Nicosia, Cyprus
| | - Stephanie Duguez
- Myologie Centre de Recherche, Université Sorbonne, UMRS 974 UPMC, INSERM, FRE 3617 CNRS, AIM, Paris, France
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, C-TRIC, Ulster University, Altnagelvin Hospital Campus, Glenshane Road, Derry/Londonderry, BT47 6SB UK
| | - William Duddy
- Myologie Centre de Recherche, Université Sorbonne, UMRS 974 UPMC, INSERM, FRE 3617 CNRS, AIM, Paris, France
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, C-TRIC, Ulster University, Altnagelvin Hospital Campus, Glenshane Road, Derry/Londonderry, BT47 6SB UK
| |
Collapse
|
8
|
Huang CT, Hsieh CH, Lee WC, Liu YL, Yang TS, Hsu WM, Oyang YJ, Huang HC, Juan HF. Therapeutic Targeting of Non-oncogene Dependencies in High-risk Neuroblastoma. Clin Cancer Res 2019; 25:4063-4078. [PMID: 30952635 DOI: 10.1158/1078-0432.ccr-18-4117] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 02/17/2019] [Accepted: 03/28/2019] [Indexed: 11/16/2022]
Abstract
PURPOSE Neuroblastoma is a pediatric malignancy of the sympathetic nervous system with diverse clinical behaviors. Genomic amplification of MYCN oncogene has been shown to drive neuroblastoma pathogenesis and correlate with aggressive disease, but the survival rates for those high-risk tumors carrying no MYCN amplification remain equally dismal. The paucity of mutations and molecular heterogeneity has hindered the development of targeted therapies for most advanced neuroblastomas. We use an alternative method to identify potential drugs that target nononcogene dependencies in high-risk neuroblastoma. EXPERIMENTAL DESIGN By using a gene expression-based integrative approach, we identified prognostic signatures and potentially effective single agents and drug combinations for high-risk neuroblastoma. RESULTS Among these predictions, we validated in vitro efficacies of some investigational and marketed drugs, of which niclosamide, an anthelmintic drug approved by the FDA, was further investigated in vivo. We also quantified the proteomic changes during niclosamide treatment to pinpoint nucleoside diphosphate kinase 3 (NME3) downregulation as a potential mechanism for its antitumor activity. CONCLUSIONS Our results establish a gene expression-based strategy to interrogate cancer biology and inform drug discovery and repositioning for high-risk neuroblastoma.
Collapse
Affiliation(s)
- Chen-Tsung Huang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Chiao-Hui Hsieh
- Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan
| | - Wen-Chi Lee
- Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan
| | - Yen-Lin Liu
- Department of Pediatrics, Taipei Medical University Hospital, Taipei, Taiwan
| | - Tsai-Shan Yang
- Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Wen-Ming Hsu
- Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Yen-Jen Oyang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Hsuan-Cheng Huang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
| | - Hsueh-Fen Juan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan. .,Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan.,Department of Life Science, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
9
|
Borisov N, Shabalina I, Tkachev V, Sorokin M, Garazha A, Pulin A, Eremin II, Buzdin A. Shambhala: a platform-agnostic data harmonizer for gene expression data. BMC Bioinformatics 2019; 20:66. [PMID: 30727942 PMCID: PMC6366102 DOI: 10.1186/s12859-019-2641-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 01/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms. Electronic supplementary material The online version of this article (10.1186/s12859-019-2641-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nicolas Borisov
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia. .,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.
| | - Irina Shabalina
- Faculty of Mathematics and Information Technologies, Petrozavodsk State University, Anokhina str., 20, Petrozavodsk, 185910, Russia
| | - Victor Tkachev
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| | - Andrew Garazha
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Laboratory of Bioinformatics, Oncology and Immunology, D. Rogachyov Federal Research Center of Pediatric Hematology, Moscow, 117198, Russia
| | - Andrey Pulin
- Laboratory for Cell Biology and Developmental Pathology, Federal State Institution "Institute of General Pathology and Pathophysiology", FSBSI "IGPP", Moscow, Russia
| | - Ilya I Eremin
- Department for Regenerative Medicine, JSC Generium, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| |
Collapse
|
10
|
Li AH, Bradic J. Boosting in the Presence of Outliers: Adaptive Classification With Nonconvex Loss Functions. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2016.1273116] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Alexander Hanbo Li
- Department of Mathematics, University of California at San Diego, La Jolla, CA
| | - Jelena Bradic
- Department of Mathematics, University of California at San Diego, La Jolla, CA
| |
Collapse
|
11
|
Ou-Yang L, Zhang XF, Wu M, Li XL. Node-based learning of differential networks from multi-platform gene expression data. Methods 2017; 129:41-49. [DOI: 10.1016/j.ymeth.2017.05.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Revised: 04/11/2017] [Accepted: 05/18/2017] [Indexed: 01/07/2023] Open
|
12
|
Borisov N, Suntsova M, Sorokin M, Garazha A, Kovalchuk O, Aliper A, Ilnitskaya E, Lezhnina K, Korzinkin M, Tkachev V, Saenko V, Saenko Y, Sokov DG, Gaifullin NM, Kashintsev K, Shirokorad V, Shabalina I, Zhavoronkov A, Mishra B, Cantor CR, Buzdin A. Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. Cell Cycle 2017; 16:1810-1823. [PMID: 28825872 DOI: 10.1080/15384101.2017.1361068] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
High throughput technologies opened a new era in biomedicine by enabling massive analysis of gene expression at both RNA and protein levels. Unfortunately, expression data obtained in different experiments are often poorly compatible, even for the same biologic samples. Here, using experimental and bioinformatic investigation of major experimental platforms, we show that aggregation of gene expression data at the level of molecular pathways helps to diminish cross- and intra-platform bias otherwise clearly seen at the level of individual genes. We created a mathematical model of cumulative suppression of data variation that predicts the ideal parameters and the optimal size of a molecular pathway. We compared the abilities to aggregate experimental molecular data for the 5 alternative methods, also evaluated by their capacity to retain meaningful features of biologic samples. The bioinformatic method OncoFinder showed optimal performance in both tests and should be very useful for future cross-platform data analyses.
Collapse
Affiliation(s)
- Nicolas Borisov
- a Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, National Research Centre "Kurchatov Institute" , Moscow , Russia.,b Department of R&D, First Oncology Research and Advisory Center , Moscow , Russia
| | - Maria Suntsova
- c Department of R&D, Center for Biogerontology and Regenerative Medicine , Moscow , Russia.,d Laboratory of Bioinformatics, D. Rogachyov Federal Research Center of Pediatric Hematology, Oncology and Immunology , Moscow , Russia
| | - Maxim Sorokin
- a Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, National Research Centre "Kurchatov Institute" , Moscow , Russia.,e Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry , Moscow , Russia
| | - Andrew Garazha
- c Department of R&D, Center for Biogerontology and Regenerative Medicine , Moscow , Russia.,f Department of R&D, OmicsWay Corporation , Walnut , CA , USA
| | - Olga Kovalchuk
- g Department of Biological Sciences , University of Lethbridge , Lethbridge , AB , Canada
| | - Alexander Aliper
- d Laboratory of Bioinformatics, D. Rogachyov Federal Research Center of Pediatric Hematology, Oncology and Immunology , Moscow , Russia
| | - Elena Ilnitskaya
- c Department of R&D, Center for Biogerontology and Regenerative Medicine , Moscow , Russia
| | - Ksenia Lezhnina
- b Department of R&D, First Oncology Research and Advisory Center , Moscow , Russia
| | - Mikhail Korzinkin
- c Department of R&D, Center for Biogerontology and Regenerative Medicine , Moscow , Russia
| | - Victor Tkachev
- f Department of R&D, OmicsWay Corporation , Walnut , CA , USA
| | - Vyacheslav Saenko
- h Technological Research Institute S.P. Kapitsa , Ulyanovsk State University , Ulyanovsk , Russia
| | - Yury Saenko
- h Technological Research Institute S.P. Kapitsa , Ulyanovsk State University , Ulyanovsk , Russia
| | - Dmitry G Sokov
- i Chemotherapy Department, Moscow 1st Oncological Hospital , Moscow , Russia
| | - Nurshat M Gaifullin
- j Faculty of Fundamental Medicine , Lomonosov Moscow State University , Moscow , Russia.,k Department of Oncology, Russian Medical Postgraduate Academy , Moscow , Russia
| | - Kirill Kashintsev
- l Chemotherapy Department, Moscow Oncological Hospital 62 , Stepanovskoye , Russia
| | - Valery Shirokorad
- l Chemotherapy Department, Moscow Oncological Hospital 62 , Stepanovskoye , Russia
| | - Irina Shabalina
- m Faculty of Mathematics and Information Technologies , Petrozavodsk State University , Petrozavodsk , Russia
| | - Alex Zhavoronkov
- d Laboratory of Bioinformatics, D. Rogachyov Federal Research Center of Pediatric Hematology, Oncology and Immunology , Moscow , Russia
| | | | - Charles R Cantor
- o Department of Biomedical Engineering , Boston University , Boston , MA , USA
| | - Anton Buzdin
- a Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, National Research Centre "Kurchatov Institute" , Moscow , Russia.,b Department of R&D, First Oncology Research and Advisory Center , Moscow , Russia.,e Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry , Moscow , Russia.,f Department of R&D, OmicsWay Corporation , Walnut , CA , USA
| |
Collapse
|
13
|
Ou-Yang L, Yan H, Zhang XF. Identifying differential networks based on multi-platform gene expression data. MOLECULAR BIOSYSTEMS 2017; 13:183-192. [PMID: 27868129 DOI: 10.1039/c6mb00619a] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Exploring how the structure of a gene regulatory network differs between two different disease states is fundamental for understanding the biological mechanisms behind disease development and progression. Recently, with rapid advances in microarray technologies, gene expression profiles of the same patients can be collected from multiple microarray platforms. However, previous differential network analysis methods were usually developed based on a single type of platform, which could not utilize the common information shared across different platforms. In this study, we introduce a multi-view differential network analysis model to infer the differential network between two different patient groups based on gene expression profiles collected from multiple platforms. Unlike previous differential network analysis models that need to analyze each platform separately, our model can draw support from multiple data platforms to jointly estimate the differential networks and produce more accurate and reliable results. Our simulation studies demonstrate that our method consistently outperforms other available differential network analysis methods. We also applied our method to identify network rewiring associated with platinum resistance using TCGA ovarian cancer samples. The experimental results demonstrate that the hub genes in our identified differential networks on the PI3K/AKT/mTOR pathway play an important role in drug resistance.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering, Shenzhen University, Shenzhen, China and Department of Electronic and Engineering, City University of Hong Kong, Hong Kong, China
| | - Hong Yan
- Department of Electronic and Engineering, City University of Hong Kong, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China.
| |
Collapse
|
14
|
Differential network analysis from cross-platform gene expression data. Sci Rep 2016; 6:34112. [PMID: 27677586 PMCID: PMC5039701 DOI: 10.1038/srep34112] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 09/07/2016] [Indexed: 01/18/2023] Open
Abstract
Understanding how the structure of gene dependency network changes between two patient-specific groups is an important task for genomic research. Although many computational approaches have been proposed to undertake this task, most of them estimate correlation networks from group-specific gene expression data independently without considering the common structure shared between different groups. In addition, with the development of high-throughput technologies, we can collect gene expression profiles of same patients from multiple platforms. Therefore, inferring differential networks by considering cross-platform gene expression profiles will improve the reliability of network inference. We introduce a two dimensional joint graphical lasso (TDJGL) model to simultaneously estimate group-specific gene dependency networks from gene expression profiles collected from different platforms and infer differential networks. TDJGL can borrow strength across different patient groups and data platforms to improve the accuracy of estimated networks. Simulation studies demonstrate that TDJGL provides more accurate estimates of gene networks and differential networks than previous competing approaches. We apply TDJGL to the PI3K/AKT/mTOR pathway in ovarian tumors to build differential networks associated with platinum resistance. The hub genes of our inferred differential networks are significantly enriched with known platinum resistance-related genes and include potential platinum resistance-related genes.
Collapse
|
15
|
Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery. MICROARRAYS 2015; 4:389-406. [PMID: 27600230 PMCID: PMC4996376 DOI: 10.3390/microarrays4030389] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 08/16/2015] [Accepted: 08/17/2015] [Indexed: 01/24/2023]
Abstract
The diagnostic and prognostic potential of the vast quantity of publicly-available microarray data has driven the development of methods for integrating the data from different microarray platforms. Cross-platform integration, when appropriately implemented, has been shown to improve reproducibility and robustness of gene signature biomarkers. Microarray platform integration can be conceptually divided into approaches that perform early stage integration (cross-platform normalization) versus late stage data integration (meta-analysis). A growing number of statistical methods and associated software for platform integration are available to the user, however an understanding of their comparative performance and potential pitfalls is critical for best implementation. In this review we provide evidence-based, practical guidance to researchers performing cross-platform integration, particularly with an objective to discover biomarkers.
Collapse
|