1
|
Li L, Shen L, Wu H, Li M, Chen L, Zhou Q, Ma J, Huai C, Zhou W, Wei M, Zhao M, Zhao X, Du H, Jiang B, Sun Y, Zhang N, Qin S, Xing T. An integrated analysis identifies six molecular subtypes of pancreatic ductal adenocarcinoma revealing cellular and molecular landscape. Carcinogenesis 2023; 44:726-740. [PMID: 37747815 DOI: 10.1093/carcin/bgad068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 09/05/2023] [Accepted: 09/22/2023] [Indexed: 09/27/2023] Open
Abstract
Pancreatic ductal adenocarcinoma (PDA) has been found to have a high mortality rate. Despite continuous efforts, current histopathological classification is insufficient to guide individualized therapies of PDA. We first define the molecular subtypes of PDA (MSOP) based on a meta-cohort of 845 samples from 11 PDA datasets. We then performed functional analyses involving immunity, fibrosis and metabolism. We recognized six molecular subtypes with different survival statistics and molecular composition. The squamous basal-like (SBL) subtype had a poor prognosis and high infiltration of ENO1+ (Enolase 1)/ADM+ (Adrenomedullin) cancer-associated fibroblasts (CAFs). The immune mesenchymal-like (IML) subtype and the normal mesenchymal-like (NML) subtype were characterized by genes associated with extracellular matrix (ECM) activities and immune responses, having favorable prognoses. IML was featured by elevated exhausted immune signaling and inflammatory CAFs infiltration, whereas NML was featured with myofibroblastic CAFs infiltration. The exocrine-like (EL) subtype was high in exocrine signals, while the pure classical-like (PCL) subtype lacked immunocytes infiltration. The quiescent-like (QL) subtype had diminished metabolic signaling and high infiltration of NK cells. SBL, IML and NML were enriched in innate anti-PD-1 resistance signatures. In sum, this MSOP depicts a vivid cell-to-molecular atlas of the tumor microenvironment of PDA and might facilitate to design a precise combination of therapies that target immunity, metabolism and stroma.
Collapse
Affiliation(s)
- Lixing Li
- Department of General Surgery, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Department of Liver Surgery, Liver Cancer Institute, Zhongshan Hospital, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Fudan University, Shanghai, China
| | - Lu Shen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China
| | - Hao Wu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Mo Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Luan Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Qiang Zhou
- Department of Liver Surgery, Liver Cancer Institute, Zhongshan Hospital, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Fudan University, Shanghai, China
| | - Jingsong Ma
- Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou 310024, Zhejiang Province, China
- School of Engineering, Westlake University, Hangzhou 310024, Zhejiang Province, China
| | - Cong Huai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Wei Zhou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Muyun Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Mingzhe Zhao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Xianglong Zhao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Huihui Du
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Bixuan Jiang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Yidan Sun
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Na Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Shengying Qin
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
| | - Tonghai Xing
- Department of General Surgery, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
2
|
Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023; 10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp, Walnut, CA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | - Alexander Simonov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Clinic for Neurosurgery, Laboratory of Experimental Neurooncology, Johannes Gutenberg University Medical Centre, Mainz, Germany
| | - Denis Kuzmin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Betul Karademir-Yilmaz
- Department of Biochemistry, School of Medicine/Genetic and Metabolic Diseases Research and Investigation Center (GEMHAM) Marmara University, Istanbul, Türkiye
| | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
3
|
Ye H, Zhang X, Wang C, Goode EL, Chen J. Batch-effect correction with sample remeasurement in highly confounded case-control studies. Nat Comput Sci 2023; 3:709-719. [PMID: 38177326 PMCID: PMC10993308 DOI: 10.1038/s43588-023-00500-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/11/2023] [Indexed: 01/06/2024]
Abstract
Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch-effect correction with remeasured samples are severely underdeveloped. Here we developed a framework for batch-effect correction using remeasured samples in highly confounded case-control studies. We provided theoretical analyses of the proposed procedure, evaluated its power characteristics and provided a power calculation tool to aid in the study design. We found that the number of samples that need to be remeasured depends strongly on the between-batch correlation. When the correlation is high, remeasuring a small subset of samples is possible to rescue most of the power.
Collapse
Affiliation(s)
- Hanxuan Ye
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, USA.
| | - Chen Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Ellen L Goode
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
4
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. Adv Genet (Hoboken) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
5
|
Nakamura N, Hamada R, Kaneko H, Ohta S. Selecting optimum miRNA panel for miRNA signature-based companion diagnostic model to predict the response of R-CHOP treatment in diffuse large B-cell lymphoma. J Biosci Bioeng 2023:S1389-1723(23)00022-1. [PMID: 36732209 DOI: 10.1016/j.jbiosc.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/21/2022] [Accepted: 01/11/2023] [Indexed: 02/01/2023]
Abstract
Diffuse large B-cell lymphoma (DLBCL) is the most common type of malignant lymphoma. Although the first-line treatment, R-CHOP treatment, shows efficacy in approximately 80% of patients with DLBCL, some patients have refractory disease or relapse after the initial response to therapy, resulting in a significantly poorer prognosis. In this study, we developed a microRNA (miRNA) signature-based companion diagnostic model to predict the response of patients with DLBCL to R-CHOP treatment by integrating two clinical study datasets. To select the optimum miRNA combination as a panel, we examined three feature selection methods (p-value-based ranking, stepwise method, and Boruta), together with 11 types of classifiers systematically. Boruta selection enabled a higher area under the curve (AUC) with a lower number of miRNAs compared with other feature selection methods, leading to an AUC of 0.751 via the random forest classifier using 36 miRNAs. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis suggested that Boruta avoided multiple selection of miRNAs with similar functions, thereby preventing the decrease in diagnostic ability via collinearity. The AUC value first increased with an increasing number of miRNAs and then became almost constant at approximately 30 miRNAs, suggesting the existence of the optimum number of miRNAs as a panel for future clinical translation of multiple miRNA-based diagnostics.
Collapse
|
6
|
Adamer MF, Brüningk SC, Tejada-Arranz A, Estermann F, Basler M, Borgwardt K. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. Bioinform Adv 2022; 2:vbac071. [PMID: 36699372 PMCID: PMC9710604 DOI: 10.1093/bioadv/vbac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/01/2022] [Accepted: 09/26/2022] [Indexed: 01/28/2023]
Abstract
Motivation With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm. Results We present reComBat, a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen Pseudomonas aeruginosa. Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. reComBat fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study. Availability and implementation The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | | | | | - Marek Basler
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland,Swiss Institute for Bioinformatics (SIB), Lausanne 1015, Switzerland
| |
Collapse
|
7
|
Eskandarian P, Mohasefi JB, Pirnejad H, Niazkhani Z. A novel artificial neural network improves multivariate feature extraction in predicting correlated multivariate time series. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022; 10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.
Collapse
|
9
|
Huang HH, Rao H, Miao R, Liang Y. A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression. BMC Bioinformatics 2022; 23:353. [PMID: 35999505 PMCID: PMC9396780 DOI: 10.1186/s12859-022-04887-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 12/22/2022] Open
Abstract
Background Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. Results Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${{\varvec{L}}}_{\mathbf{2}}$$\end{document}L2). The DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{L}}_{\mathbf{2}}$$\end{document}L2 method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. Conclusion We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets.
Collapse
Affiliation(s)
- Hai-Hui Huang
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Hao Rao
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Rui Miao
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- The Peng Cheng Laboratory, Shenzhen, China.
| |
Collapse
|
10
|
Yu J, Tu W, Payne A, Rudyk C, Cuadros Sanchez S, Khilji S, Kumarathasan P, Subedi S, Haley B, Wong A, Anghel C, Wang Y, Chauhan V. Adverse Outcome Pathways and Linkages to Transcriptomic Effects Relevant to Ionizing Radiation Injury. Int J Radiat Biol 2022; 98:1789-1801. [PMID: 35939063 DOI: 10.1080/09553002.2022.2110313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
BACKGROUND In the past three decades, a large body of data on the effects of exposure to ionizing radiation and the ensuing changes in gene expression has been generated. These data have allowed for an understanding of molecular-level events and shown a level of consistency in response despite the vast formats and experimental procedures being used across institutions. However, clarity on how this information may inform strategies for health risk assessment needs to be explored. An approach to bridge this gap is the adverse outcome pathway (AOP) framework. AOPs represent an illustrative framework characterizing a stressor associated with a sequential set of causally linked key events (KEs) at different levels of biological organization, beginning with a molecular initiating event (MIE) and culminating in an adverse outcome (AO). Here, we demonstrate the interpretation of transcriptomic datasets in the context of the AOP framework within the field of ionizing radiation by using a lung cancer AOP (AOP 272: https://www.aopwiki.org/aops/272) as a case example. METHODS Through the mining of the literature, radiation exposure-related transcriptomic studies in line with AOP 272 related to lung cancer, DNA damage response, and repair were identified. The differentially expressed genes within relevant studies were collated and subjected to the pathway and network analysis using Reactome and GeneMANIA platforms. Identified pathways were filtered (p < 0.001, ≥ 3 genes) and categorized based on relevance to KEs in the AOP. Gene connectivities were identified and further grouped by gene expression-informed associated events (AEs). Relevant quantitative dose-response data were used to inform the directionality in the expression of the genes in the network across AEs. RESULTS Reactome analyses identified 7 high-level biological processes with multiple pathways and associated genes that mapped to potential KEs in AOP 272. The gene connectivities were further represented as a network of AEs with associated expression profiles that highlighted patterns of gene expression levels. CONCLUSIONS This study demonstrates the application of transcriptomics data in AOP development and provides information on potential data gaps. Although the approach is new and anticipated to evolve, it shows promise for improving the understanding of underlying mechanisms of disease progression with a long-term vision to be predictive of adverse outcomes.
Collapse
Affiliation(s)
- Jihang Yu
- Canadian Nuclear Laboratories, Chalk River, Ontario, Canada
| | - Wangshu Tu
- Carleton University, Ottawa, Ontario, Canada
| | | | - Chris Rudyk
- Carleton University, Ottawa, Ontario, Canada
| | | | | | | | | | - Brittany Haley
- Canadian Nuclear Laboratories, Chalk River, Ontario, Canada
| | - Alicia Wong
- Canadian Nuclear Laboratories, Chalk River, Ontario, Canada.,McMaster University, Hamilton, Ontario, Canada
| | | | - Yi Wang
- Canadian Nuclear Laboratories, Chalk River, Ontario, Canada.,University of Ottawa, Ottawa, Ontario, Canada
| | | |
Collapse
|
11
|
Rincourt SL, Michiels S, Drubay D. Complex Disease Individual Molecular Characterization Using Infinite Sparse Graphical Independent Component Analysis. Cancer Inform 2022; 21:11769351221105776. [PMID: 35860346 PMCID: PMC9290103 DOI: 10.1177/11769351221105776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/22/2022] [Indexed: 11/16/2022] Open
Abstract
Identifying individual mechanisms involved in complex diseases, such as cancer, is essential for precision medicine. Their characterization is particularly challenging due to the unknown relationships of high-dimensional omics data and their inter-patient heterogeneity. We propose to model individual gene expression as a combination of unobserved molecular mechanisms (molecular components) that may differ between the individuals. Considering a baseline molecular profile common to all individuals, these molecular components may represent molecular pathways differing from the population background. We defined an infinite sparse graphical independent component analysis (isgICA) to identify these molecular components. This model relies on double sparseness: the source matrix sparseness defines the subset of genes involved in each molecular component, whereas the weight matrix sparseness identifies the subset of molecular components associated with each patient. As the number of molecular components is unknown but likely high, we simultaneously inferred it and the weight matrix sparseness using the beta-Bernoulli process (BBP). We simulated data from a double sparse ICA with 10/30 components with specific sparseness structures for 100/500 individuals and 500/1000/5000 genes with different noise variance levels to evaluate the reconstruction of the latent structures by our model. For all simulations, the isgICA was able to reconstruct with higher accuracy than 2 state-of-the-art methods (ica and fastICA) the number of components, the weight and source matrix sparsenesses (correlation simulated/estimated >.8). Applying our model to the expression of 1063 genes of 614 breast cancer patients, the isgICA identified 22 components. According to the source matrix, 7 of these 22 components seemed to be specifically related to 3 known molecular pathways with a prognostic effect in early breast cancer (immune system, proliferation, and stroma invasion). This proposed algorithm provides an insight into individual molecular heterogeneity to better understand complex disease mechanisms.
Collapse
Affiliation(s)
- Sarah-Laure Rincourt
- Oncostat U1018, Inserm, University Paris-Saclay, Labelled Ligue Contre le Cancer, Villejuif, France
| | - Stefan Michiels
- Oncostat U1018, Inserm, University Paris-Saclay, Labelled Ligue Contre le Cancer, Villejuif, France.,Department of Biostatistics and Epidemiology, Gustave Roussy, University Paris-Saclay, Villejuif, France
| | - Damien Drubay
- Oncostat U1018, Inserm, University Paris-Saclay, Labelled Ligue Contre le Cancer, Villejuif, France.,Department of Biostatistics and Epidemiology, Gustave Roussy, University Paris-Saclay, Villejuif, France
| |
Collapse
|
12
|
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F, Yang G. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. Inf Fusion 2022; 82:99-122. [PMID: 35664012 PMCID: PMC8878813 DOI: 10.1016/j.inffus.2022.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 12/22/2021] [Accepted: 01/07/2022] [Indexed: 05/13/2023]
Abstract
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Collapse
Affiliation(s)
- Yang Nan
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Javier Del Ser
- Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
- TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
| | - Simon Walsh
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Carola Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
- Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
| | - Ian Selby
- Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
| | - Kit Howard
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - John Owen
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Jon Neville
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Julien Guiot
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | - Benoit Ernst
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | | | | | - Marion I. Menzel
- Technische Hochschule Ingolstadt, Ingolstadt, Germany
- GE Healthcare GmbH, Munich, Germany
| | - Sean Walsh
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Nina Flerin
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | | | | | - Avishek Chatterjee
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Henry Woodruff
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Philippe Lambin
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Leonor Cerdá-Alberich
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Luis Martí-Bonmatí
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Francisco Herrera
- Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
| |
Collapse
|
13
|
Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022; 2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Maksim Sorokin
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Marianna Zolotovskaya
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Oncobox Ltd., Moscow, Russia
| | | | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia.,PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
14
|
Ozturk H, Cingoz H, Tufan T, Yang J, Adair SJ, Tummala KS, Kuscu C, Kinali M, Comertpay G, Nagdas S, Goudreau BJ, Luleyap HU, Bingul Y, Ware TB, Hwang WL, Hsu KL, Kashatus DF, Ting DT, Chandel NS, Bardeesy N, Bauer TW, Adli M. ISL2 is a putative tumor suppressor whose epigenetic silencing reprograms the metabolism of pancreatic cancer. Dev Cell 2022; 57:1331-1346.e9. [PMID: 35508175 DOI: 10.1016/j.devcel.2022.04.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 03/11/2022] [Accepted: 04/08/2022] [Indexed: 12/17/2022]
Abstract
Pancreatic ductal adenocarcinoma (PDA) cells reprogram their transcriptional and metabolic programs to survive the nutrient-poor tumor microenvironment. Through in vivo CRISPR screening, we discovered islet-2 (ISL2) as a candidate tumor suppressor that modulates aggressive PDA growth. Notably, ISL2, a nuclear and chromatin-associated transcription factor, is epigenetically silenced in PDA tumors and high promoter DNA methylation or its reduced expression correlates with poor patient survival. The exogenous ISL2 expression or CRISPR-mediated upregulation of the endogenous loci reduces cell proliferation. Mechanistically, ISL2 regulates the expression of metabolic genes, and its depletion increases oxidative phosphorylation (OXPHOS). As such, ISL2-depleted human PDA cells are sensitive to the inhibitors of mitochondrial complex I in vitro and in vivo. Spatial transcriptomic analysis shows heterogeneous intratumoral ISL2 expression, which correlates with the expression of critical metabolic genes. These findings nominate ISL2 as a putative tumor suppressor whose inactivation leads to increased mitochondrial metabolism that may be exploitable therapeutically.
Collapse
Affiliation(s)
- Harun Ozturk
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Obstetrics and Gynecology, Chicago, IL 60611, USA
| | - Harun Cingoz
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Obstetrics and Gynecology, Chicago, IL 60611, USA
| | - Turan Tufan
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Jiekun Yang
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Sara J Adair
- Department of Surgery, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | | | - Cem Kuscu
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Meric Kinali
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Obstetrics and Gynecology, Chicago, IL 60611, USA
| | | | - Sarbajeet Nagdas
- Department of Cell, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Bernadette J Goudreau
- Department of Surgery, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | | | - Yagmur Bingul
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Obstetrics and Gynecology, Chicago, IL 60611, USA
| | - Timothy B Ware
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - Wiliam L Hwang
- Harvard Medical School, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Ku-Lung Hsu
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - David F Kashatus
- Department of Cell, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - David T Ting
- Harvard Medical School, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Navdeep S Chandel
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Pulmonary and Critical Care and Department of Biochemistry and Molecular Genetics, Chicago, IL 60611, USA
| | - Nabeel Bardeesy
- Harvard Medical School, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Todd W Bauer
- Department of Surgery, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Mazhar Adli
- Northwestern University Feinberg School of Medicine, Robert Lurie Comprehensive Cancer Center, Department of Obstetrics and Gynecology, Chicago, IL 60611, USA.
| |
Collapse
|
15
|
Konovalov N, Timonin S, Asyutin D, Raevskiy M, Sorokin M, Buzdin A, Kaprovoy S. Transcriptomic Portraits and Molecular Pathway Activation Features of Adult Spinal Intramedullary Astrocytomas. Front Oncol 2022; 12:837570. [PMID: 35387112 PMCID: PMC8978956 DOI: 10.3389/fonc.2022.837570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 02/21/2022] [Indexed: 11/30/2022] Open
Abstract
In this study, we report 31 spinal intramedullary astrocytoma (SIA) RNA sequencing (RNA-seq) profiles for 25 adult patients with documented clinical annotations. To our knowledge, this is the first clinically annotated RNA-seq dataset of spinal astrocytomas derived from the intradural intramedullary compartment. We compared these tumor profiles with the previous healthy central nervous system (CNS) RNA-seq data for spinal cord and brain and identified SIA-specific gene sets and molecular pathways. Our findings suggest a trend for SIA-upregulated pathways governing interactions with the immune cells and downregulated pathways for the neuronal functioning in the context of normal CNS activity. In two patient tumor biosamples, we identified diagnostic KIAA1549-BRAF fusion oncogenes, and we also found 16 new SIA-associated fusion transcripts. In addition, we bioinformatically simulated activities of targeted cancer drugs in SIA samples and predicted that several tyrosine kinase inhibitory drugs and thalidomide analogs could be potentially effective as second-line treatment agents to aid in the prevention of SIA recurrence and progression.
Collapse
Affiliation(s)
| | | | | | - Mikhail Raevskiy
- Omicsway Corp., Walnut, CA, United States.,Moscow Institute of Physics and Technology, Moscow, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Moscow, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Anton Buzdin
- Omicsway Corp., Walnut, CA, United States.,Moscow Institute of Physics and Technology, Moscow, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Oncobox Ltd., Moscow, Russia
| | | |
Collapse
|
16
|
Bouron C, Mathie C, Seegers V, Morel O, Jézéquel P, Lasla H, Guillerminet C, Girault S, Lacombe M, Sher A, Lacoeuille F, Patsouris A, Testard A. Prognostic Value of Metabolic, Volumetric and Textural Parameters of Baseline [ 18F]FDG PET/CT in Early Triple-Negative Breast Cancer. Cancers (Basel) 2022; 14:cancers14030637. [PMID: 35158904 PMCID: PMC8833829 DOI: 10.3390/cancers14030637] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 01/22/2022] [Accepted: 01/23/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The aim of this study was to evaluate PET/CT parameters to determine different prognostic groups in TNBC, in order to select patients with a high risk of relapse, for whom therapeutic escalation can be considered. We have demonstrated that the MTV, TLG and entropy of the primary breast lesion could be of interest to predict the prognostic outcome of TNBC patients. Abstract (1) Background: triple-negative breast cancer (TNBC) remains a clinical and therapeutic challenge primarily affecting young women with poor prognosis. TNBC is currently treated as a single entity but presents a very diverse profile in terms of prognosis and response to treatment. Positron emission tomography/computed tomography (PET/CT) with 18F-fluorodeoxyglucose ([18F]FDG) is gaining importance for the staging of breast cancers. TNBCs often show high [18F]FDG uptake and some studies have suggested a prognostic value for metabolic and volumetric parameters, but no study to our knowledge has examined textural features in TNBC. The objective of this study was to evaluate the association between metabolic, volumetric and textural parameters measured at the initial [18F]FDG PET/CT and disease-free survival (DFS) and overall survival (OS) in patients with nonmetastatic TBNC. (2) Methods: all consecutive nonmetastatic TNBC patients who underwent a [18F]FDG PET/CT examination upon diagnosis between 2012 and 2018 were retrospectively included. The metabolic and volumetric parameters (SUVmax, SUVmean, SUVpeak, MTV, and TLG) and the textural features (entropy, homogeneity, SRE, LRE, LGZE, and HGZE) of the primary tumor were collected. (3) Results: 111 patients were enrolled (median follow-up: 53.6 months). In the univariate analysis, high TLG, MTV and entropy values of the primary tumor were associated with lower DFS (p = 0.008, p = 0.006 and p = 0.025, respectively) and lower OS (p = 0.002, p = 0.001 and p = 0.046, respectively). The discriminating thresholds for two-year DFS were calculated as 7.5 for MTV, 55.8 for TLG and 2.6 for entropy. The discriminating thresholds for two-year OS were calculated as 9.3 for MTV, 57.4 for TLG and 2.67 for entropy. In the multivariate analysis, lymph node involvement in PET/CT was associated with lower DFS (p = 0.036), and the high MTV of the primary tumor was correlated with lower OS (p = 0.014). (4) Conclusions: textural features associated with metabolic and volumetric parameters of baseline [18F]FDG PET/CT have a prognostic value for identifying high-relapse-risk groups in early TNBC patients.
Collapse
Affiliation(s)
- Clément Bouron
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
- Department of Nuclear Medicine, University Hospital of Angers, 4 rue Larrey, 49100 Angers, France;
- Correspondence:
| | - Clara Mathie
- Department of Medical Oncology, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (C.M.); (A.P.)
| | - Valérie Seegers
- Research and Statistics Department, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France;
| | - Olivier Morel
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
| | - Pascal Jézéquel
- Omics Data Science Unit, ICO Pays de la Loire, Bd Jacques Monod, CEDEX, 44805 Saint-Herblain, France; (P.J.); (H.L.)
- CRCINA, UMR 1232 INSERM, Université de Nantes, Université d’Angers, Institut de Recherche en Santé, 8 Quai Moncousu—BP 70721, CEDEX 1, 44007 Nantes, France
| | - Hamza Lasla
- Omics Data Science Unit, ICO Pays de la Loire, Bd Jacques Monod, CEDEX, 44805 Saint-Herblain, France; (P.J.); (H.L.)
| | - Camille Guillerminet
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
- Department of Medical Physics, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France
| | - Sylvie Girault
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
| | - Marie Lacombe
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
| | - Avigaelle Sher
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
| | - Franck Lacoeuille
- Department of Nuclear Medicine, University Hospital of Angers, 4 rue Larrey, 49100 Angers, France;
- CRCINA, University of Nantes and Angers, INSERM UMR1232 équipe 17, 49055 Angers, France
| | - Anne Patsouris
- Department of Medical Oncology, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (C.M.); (A.P.)
- INSERM UMR1232 équipe 12, 49055 Angers, France
| | - Aude Testard
- Department of Nuclear Medicine, ICO Pays de la Loire, 15 rue André Boquel, 49055 Angers, France; (O.M.); (C.G.); (S.G.); (M.L.); (A.S.); (A.T.)
| |
Collapse
|
17
|
Kawaguchi ES, Li G, Lewinger JP, Gauderman WJ. Two-step hypothesis testing to detect gene-environment interactions in a genome-wide scan with a survival endpoint. Stat Med 2022; 41:1644-1657. [PMID: 35075649 PMCID: PMC9007892 DOI: 10.1002/sim.9319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 11/10/2021] [Accepted: 12/26/2021] [Indexed: 01/13/2023]
Abstract
Defined by their genetic profile, individuals may exhibit differential clinical outcomes due to an environmental exposure. Identifying subgroups based on specific exposure-modifying genes can lead to targeted interventions and focused studies. Genome-wide interaction scans (GWIS) can be performed to identify such genes, but these scans typically suffer from low power due to the large multiple testing burden. We provide a novel framework for powerful two-step hypothesis tests for GWIS with a time-to-event endpoint under the Cox proportional hazards model. In the Cox regression setting, we develop an approach that prioritizes genes for Step-2 G × E testing based on a carefully constructed Step-1 screening procedure. Simulation results demonstrate this two-step approach can lead to substantially higher power for identifying gene-environment ( G × E ) interactions compared to the standard GWIS while preserving the family wise error rate over a range of scenarios. In a taxane-anthracycline chemotherapy study for breast cancer patients, the two-step approach identifies several gene expression by treatment interactions that would not be detected using the standard GWIS.
Collapse
Affiliation(s)
- Eric S Kawaguchi
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| | - Gang Li
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, California, USA.,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Juan Pablo Lewinger
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| | - W James Gauderman
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
18
|
Ma C, Wu M, Ma S. Analysis of cancer omics data: a selective review of statistical techniques. Brief Bioinform 2022; 23:6510158. [PMID: 35039832 DOI: 10.1093/bib/bbab585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.
Collapse
Affiliation(s)
- Chenjin Ma
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
19
|
Lu Y, Chen J, Wang S, Tian Z, Fan Y, Wang M, Zhao J, Tang K, Xie J. Identification of Genetic Signature Associated With Aging in Pulmonary Fibrosis. Front Med (Lausanne) 2021; 8:744239. [PMID: 34746180 PMCID: PMC8564051 DOI: 10.3389/fmed.2021.744239] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/20/2021] [Indexed: 11/24/2022] Open
Abstract
Background: Aging is a strong risk factor and an independent prognostic factor in idiopathic pulmonary fibrosis (IPF). In this study, we aimed to conduct a comprehensive analysis based on gene expression profiles for the role of aging in pulmonary fibrosis. Method: Four datasets (GSE21411, GSE24206, GSE47460, and GSE101286) for patients with clinical IPF and one dataset for bleomycin (BLM)-induced pulmonary fibrosis (BIPF) mouse model (GSE123293) were obtained from Gene Expression Omnibus (GEO). According to different age ranges, both patients with IPF and BIPF mice were divided into young and aged groups. The differently expressed genes (DEGs) were systemically analyzed using Gene Ontology (GO) functional, Kyoto Encyclopedia of Genes and Genomes (KEGG), and hub genes analysis. Finally, we verified the role of age and core genes associated with age in vivo. Results:Via the expression profile comparisons of aged and young patients with IPF, we identified 108 aging-associated DEGs, with 21 upregulated and 87 downregulated. The DEGs were associated with “response to glucocorticoid,” “response to corticosteroid,” and “rhythmic process” in GO biological process (BP). For KEGG analysis, the top three significantly enriched KEGG pathways of the DEGs included “IL-17 signaling pathway,” “Mineral absorption,” and “HIF-1-signaling pathway.” Through the comparisons of aged and young BIPF mice, a total number of 778 aging-associated DEGs were identified, with 453 genes increased and 325 genes decreased. For GO and KEGG analysis, the DEGs were enriched in extracellular matrix (ECM) and collagen metabolism. The common DEGs of patients with IPF and BIPF mice were enriched in the BP category, including “induction of bacterial agglutination,” “hyaluronan biosynthetic process,” and “positive regulation of heterotypic cell-cell adhesion.” We confirmed that aged BIPF mice developed more serious pulmonary fibrosis. Finally, the four aging-associated core genes (Slc2a3, Fga, Hp, and Thbs1) were verified in vivo. Conclusion: This study provides new insights into the impact of aging on pulmonary fibrosis. We also identified four aging-associated core genes (Slc2a3, Fga, Hp, and Thbs1) related to the development of pulmonary fibrosis.
Collapse
Affiliation(s)
- Yanjiao Lu
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jinkun Chen
- Department of Science, Western University, London, ON, Canada
| | - Shanshan Wang
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhen Tian
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yan Fan
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Meijia Wang
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jianping Zhao
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Kun Tang
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.,Institute of Respiratory Diseases of Sun Yat-sen University, Guangzhou, China
| | - Jungang Xie
- Department of Respiratory and Critical Care Medicine, National Clinical Research Center of Respiratory Disease, Key Laboratory of Pulmonary Diseases of Health Ministry, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
20
|
Čuklina J, Lee CH, Williams EG, Sajic T, Collins BC, Rodríguez Martínez M, Sharma VS, Wendt F, Goetze S, Keele GR, Wollscheid B, Aebersold R, Pedrioli PGA. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol 2021; 17:e10240. [PMID: 34432947 PMCID: PMC8447595 DOI: 10.15252/msb.202110240] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/16/2021] [Accepted: 07/26/2021] [Indexed: 12/11/2022] Open
Abstract
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
Collapse
Affiliation(s)
- Jelena Čuklina
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- PhD Program in Systems BiologyUniversity of Zurich and ETH ZurichZurichSwitzerland
- IBM Research EuropeRüschlikonSwitzerland
| | - Chloe H Lee
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Evan G Williams
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgLuxembourgLuxembourg
| | - Tatjana Sajic
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Ben C Collins
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Queen’s University BelfastBelfastUK
| | | | - Varun S Sharma
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Fabian Wendt
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
| | - Sandra Goetze
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | | | - Bernd Wollscheid
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Ruedi Aebersold
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Faculty of ScienceUniversity of ZurichZurichSwitzerland
| | - Patrick G A Pedrioli
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
21
|
Fu Q, Agarwal D, Deng K, Matheson R, Yang H, Wei L, Ran Q, Deng S, Markmann JF. An Unbiased Machine Learning Exploration Reveals Gene Sets Predictive of Allograft Tolerance After Kidney Transplantation. Front Immunol 2021; 12:695806. [PMID: 34305931 PMCID: PMC8297499 DOI: 10.3389/fimmu.2021.695806] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 06/18/2021] [Indexed: 12/14/2022] Open
Abstract
Efforts at finding potential biomarkers of tolerance after kidney transplantation have been hindered by limited sample size, as well as the complicated mechanisms underlying tolerance and the potential risk of rejection after immunosuppressant withdrawal. In this work, three different publicly available genome-wide expression data sets of peripheral blood lymphocyte (PBL) from 63 tolerant patients were used to compare 14 different machine learning models for their ability to predict spontaneous kidney graft tolerance. We found that the Best Subset Selection (BSS) regression approach was the most powerful with a sensitivity of 91.7% and a specificity of 93.8% in the test group, and a specificity of 86.1% and a sensitivity of 80% in the validation group. A feature set with five genes (HLA-DOA, TCL1A, EBF1, CD79B, and PNOC) was identified using the BSS model. EBF1 downregulation was also an independent factor predictive of graft rejection and graft loss. An AUC value of 84.4% was achieved using the two-gene signature (EBF1 and HLA-DOA) as an input to our classifier. Overall, our systematic machine learning exploration suggests novel biological targets that might affect tolerance to renal allografts, and provides clinical insights that can potentially guide patient selection for immunosuppressant withdrawal.
Collapse
Affiliation(s)
- Qiang Fu
- Organ Transplantation Center, Sichuan Provincial People's Hospital and School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.,Center for Transplantation Sciences, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Divyansh Agarwal
- Division of Transplantation, Department of Surgery, Hospital of the University of Pennsylvania, Philadelphia, PA, United States
| | - Kevin Deng
- Center for Transplantation Sciences, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Rudy Matheson
- Center for Transplantation Sciences, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Hongji Yang
- Organ Transplantation Center, Sichuan Provincial People's Hospital and School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Wei
- Organ Transplantation Center, Sichuan Provincial People's Hospital and School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Qing Ran
- Organ Transplantation Center, Sichuan Provincial People's Hospital and School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Shaoping Deng
- Organ Transplantation Center, Sichuan Provincial People's Hospital and School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - James F Markmann
- Center for Transplantation Sciences, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
22
|
Wang L, Mo C, Wang L, Cheng M. Identification of genes and pathways related to breast cancer metastasis in an integrated cohort. Eur J Clin Invest 2021; 51:e13525. [PMID: 33615456 DOI: 10.1111/eci.13525] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 01/20/2021] [Accepted: 02/18/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Breast cancer is the most common malignant disease in women. Metastasis is the most common cause of death from this cancer. Screening genes related to breast cancer metastasis may help elucidate the mechanisms governing metastasis and identify molecular targets for antimetastatic therapy. The development of advanced algorithms enables us to perform cross-study analysis to improve the robustness of the results. MATERIALS AND METHODS Ten data sets meeting our criteria for differential expression analyses were obtained from the Gene Expression Omnibus (GEO) database. Among these data sets, five based on the same platform were formed into a large cohort using the XPN algorithm. Differentially expressed genes (DEGs) associated with breast cancer metastasis were identified using the differential expression via distance synthesis (DEDS) algorithm. A cross-platform method was employed to verify these DEGs in all ten selected data sets. The top 50 validated DEGs are represented with heat maps. Based on the validated DEGs, Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed. Protein interaction (PPI) networks were constructed to further illustrate the direct and indirect associations among the DEGs. Survival analysis was performed to explore whether these genes can affect breast cancer patient prognosis. RESULTS A total of 817 DEGs were identified using the DEDS algorithm. Of these DEGs, 450 genes were validated by the second algorithm. Enriched KEGG pathway terms demonstrated that these 450 DEGs may be involved in the cell cycle and oocyte meiosis in addition to their functions in ECM-receptor interaction and protein digestion and absorption. PPI network analysis for the proteins encoded by the DEGs indicated that these genes may be primarily involved in the cell cycle and extracellular matrix. In particular, several genes played roles in multiple signalling pathways and were related to patient survival. These genes were also observed to be targetable in the CTD2 database. CONCLUSIONS Our study analysed multiple cross-platform data sets using two different algorithms, helping elucidate the molecular mechanisms and identify several potential therapeutic targets of metastatic breast cancer. In addition, several genes exhibited promise for applications in targeted therapy against metastasis in future research.
Collapse
Affiliation(s)
- Lingchen Wang
- Center for Experimental Medicine, The First Affiliated Hospital of Nanchang University, Nanchang, China.,Department of Biostatistics, School of Public Health, Nanchang University, Nanchang, China
| | - Changgan Mo
- Department of Cardiology, The People's Hospital of Hechi, Hechi, China
| | - Liqin Wang
- Department of Traditional Chinese Medicine, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Minzhang Cheng
- Center for Experimental Medicine, The First Affiliated Hospital of Nanchang University, Nanchang, China.,Jiangxi Key Laboratory of Molecular Diagnostics and Precision Medicine, Nanchang, China
| |
Collapse
|
23
|
Buzdin A, Tkachev V, Zolotovskaia M, Garazha A, Moshkovskii S, Borisov N, Gaifullin N, Sorokin M, Suntsova M. Using proteomic and transcriptomic data to assess activation of intracellular molecular pathways. Adv Protein Chem Struct Biol 2021; 127:1-53. [PMID: 34340765 DOI: 10.1016/bs.apcsb.2021.02.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Analysis of molecular pathway activation is the recent instrument that helps to quantize activities of various intracellular signaling, structural, DNA synthesis and repair, and biochemical processes. This may have a deep impact in fundamental research, bioindustry, and medicine. Unlike gene ontology analyses and numerous qualitative methods that can establish whether a pathway is affected in principle, the quantitative approach has the advantage of exactly measuring the extent of a pathway up/downregulation. This results in emergence of a new generation of molecular biomarkers-pathway activation levels, which reflect concentration changes of all measurable pathway components. The input data can be the high-throughput proteomic or transcriptomic profiles, and the output numbers take both positive and negative values and positively reflect overall pathway activation. Due to their nature, the pathway activation levels are more robust biomarkers compared to the individual gene products/protein levels. Here, we review the current knowledge of the quantitative gene expression interrogation methods and their applications for the molecular pathway quantization. We consider enclosed bioinformatic algorithms and their applications for solving real-world problems. Besides a plethora of applications in basic life sciences, the quantitative pathway analysis can improve molecular design and clinical investigations in pharmaceutical industry, can help finding new active biotechnological components and can significantly contribute to the progressive evolution of personalized medicine. In addition to the theoretical principles and concepts, we also propose publicly available software for the use of large-scale protein/RNA expression data to assess the human pathway activation levels.
Collapse
|
24
|
Chen Y, Wu T, Zhu Z, Huang H, Zhang L, Goel A, Yang M, Wang X. An integrated workflow for biomarker development using microRNAs in extracellular vesicles for cancer precision medicine. Semin Cancer Biol 2021; 74:134-155. [PMID: 33766650 DOI: 10.1016/j.semcancer.2021.03.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/13/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023]
Abstract
EV-miRNAs are microRNA (miRNA) molecules encapsulated in extracellular vesicles (EVs), which play crucial roles in tumor pathogenesis, progression, and metastasis. Recent studies about EV-miRNAs have gained novel insights into cancer biology and have demonstrated a great potential to develop novel liquid biopsy assays for various applications. Notably, compared to conventional liquid biomarkers, EV-miRNAs are more advantageous in representing host-cell molecular architecture and exhibiting higher stability and specificity. Despite various available techniques for EV-miRNA separation, concentration, profiling, and data analysis, a standardized approach for EV-miRNA biomarker development is yet lacking. In this review, we performed a substantial literature review and distilled an integrated workflow encompassing important steps for EV-miRNA biomarker development, including sample collection and EV isolation, EV-miRNA extraction and quantification, high-throughput data preprocessing, biomarker prioritization and model construction, functional analysis, as well as validation. With the rapid growth of "big data", we highlight the importance of efficient mining of high-throughput data for the discovery of EV-miRNA biomarkers and integrating multiple independent datasets for in silico and experimental validations to increase the robustness and reproducibility. Furthermore, as an efficient strategy in systems biology, network inference provides insights into the regulatory mechanisms and can be used to select functionally important EV-miRNAs to refine the biomarker candidates. Despite the encouraging development in the field, a number of challenges still hinder the clinical translation. We finally summarize several common challenges in various biomarker studies and discuss potential opportunities emerging in the related fields.
Collapse
Affiliation(s)
- Yu Chen
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Tan Wu
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Zhongxu Zhu
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Hao Huang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Liang Zhang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China
| | - Ajay Goel
- Department of Molecular Diagnostics and Experimental Therapeutics, Beckman Research Institute of City of Hope Comprehensive Cancer Center, Duarte, CA, USA
| | - Mengsu Yang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China
| | - Xin Wang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China.
| |
Collapse
|
25
|
Junet V, Farrés J, Mas JM, Daura X. CuBlock: a cross-platform normalization method for gene-expression microarrays. Bioinformatics 2021; 37:2365-2373. [PMID: 33609102 PMCID: PMC8388031 DOI: 10.1093/bioinformatics/btab105] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 02/04/2021] [Accepted: 02/16/2021] [Indexed: 12/28/2022] Open
Abstract
Motivation Cross-(multi)platform normalization of gene-expression microarray data remains an unresolved issue. Despite the existence of several algorithms, they are either constrained by the need to normalize all samples of all platforms together, compromising scalability and reuse, by adherence to the platforms of a specific provider, or simply by poor performance. In addition, many of the methods presented in the literature have not been specifically tested against multi-platform data and/or other methods applicable in this context. Thus, we set out to develop a normalization algorithm appropriate for gene-expression studies based on multiple, potentially large microarray sets collected along multiple platforms and at different times, applicable in systematic studies aimed at extracting knowledge from the wealth of microarray data available in public repositories; for example, for the extraction of Real-World Data to complement data from Randomized Controlled Trials. Our main focus or criterion for performance was on the capacity of the algorithm to properly separate samples from different biological groups. Results We present CuBlock, an algorithm addressing this objective, together with a strategy to validate cross-platform normalization methods. To validate the algorithm and benchmark it against existing methods, we used two distinct datasets, one specifically generated for testing and standardization purposes and one from an actual experimental study. Using these datasets, we benchmarked CuBlock against ComBat (Johnson et al., 2007), UPC (Piccolo et al., 2013), YuGene (Lê Cao et al., 2014), DBNorm (Meng et al., 2017), Shambhala (Borisov et al., 2019) and a simple log2 transform as reference. We note that many other popular normalization methods are not applicable in this context. CuBlock was the only algorithm in this group that could always and clearly differentiate the underlying biological groups after mixing the data, from up to six different platforms in this study. Availability and implementation CuBlock can be downloaded from https://www.mathworks.com/matlabcentral/fileexchange/77882-cublock. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Valentin Junet
- Anaxomics Biotech SL, Barcelona, 08008, Spain.,Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, 08193, Spain
| | | | - José M Mas
- Anaxomics Biotech SL, Barcelona, 08008, Spain
| | - Xavier Daura
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, 08193, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, 08010, Spain
| |
Collapse
|
26
|
Eskandarian P, Bagherzadeh Mohasefi J, Pirnejad H, Niazkhani Z. Prediction of future gene expression profile by analyzing its past variation pattern. Gene Expr Patterns 2021; 39:119166. [PMID: 33444808 DOI: 10.1016/j.gep.2021.119166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/28/2020] [Accepted: 01/07/2021] [Indexed: 01/21/2023]
Abstract
A number of initial Hematopoietic Stem Cells (HSC) are considered in a container that are able to divide into HSCs or differentiate into various types of descendant cells. In this paper, a method is designed to predict an approximate gene expression profile (GEP) for future descendant cells resulted from HSC division/differentiation. First, the GEP prediction problem is modeled into a multivariate time series prediction problem. A novel method called EHSCP (Extended Hematopoietic Stem Cell Prediction) is introduced which is an artificial neural machine to solve the problem. EHSCP accepts the initial sequence of measured GEPs as input and predicts GEPs of future descendant cells. This prediction can be performed for multiple stages of cell division/differentiation. EHSCP considers the GEP sequence as time series and computes correlation between input time series. Two novel artificial neural units called PLSTM (Parametric Long Short Term Memory) and MILSTM (Multi-Input LSTM) are designed. PLSTM makes EHSCP able to consider this correlation in output prediction. Since there exist thousands of time series in GEP prediction, a hierarchical encoder is proposed that computes this correlation using 101 MILSTMs. EHSCP is trained using 155 datasets and is evaluated on 39 test datasets. These evaluations show that EHSCP surpasses existing methods in terms of prediction accuracy and number of correctly-predicted division/differentiation stages. In these evaluations, number of correctly-predicted stages in EHSCP was 128 when as many as 8 initial stages were given.
Collapse
Affiliation(s)
- Parinaz Eskandarian
- Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran.
| | - Jamshid Bagherzadeh Mohasefi
- Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran; Department of Electrical and Computer Engineering, Urmia University, Urmia, Iran.
| | - Habibollah Pirnejad
- Patient Safety Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| | - Zahra Niazkhani
- Nephrology and Kidney Transplant Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| |
Collapse
|
27
|
Abstract
Carrying out large multicenter studies is one of the key goals to be achieved towards a faster transfer of the radiomics approach in the clinical setting. This requires large-scale radiomics data analysis, hence the need for integrating radiomic features extracted from images acquired in different centers. This is challenging as radiomic features exhibit variable sensitivity to differences in scanner model, acquisition protocols and reconstruction settings, which is similar to the so-called 'batch-effects' in genomics studies. In this review we discuss existing methods to perform data integration with the aid of reducing the unwanted variation associated with batch effects. We also discuss the future potential role of deep learning methods in providing solutions for addressing radiomic multicentre studies.
Collapse
Affiliation(s)
- R Da-Ano
- LaTiM, INSERM, UMR 1101, Univ Brest, Brest, France
| | - D Visvikis
- LaTiM, INSERM, UMR 1101, Univ Brest, Brest, France
- equally contributed
| | - M Hatt
- LaTiM, INSERM, UMR 1101, Univ Brest, Brest, France
- equally contributed
| |
Collapse
|
28
|
Leng D, Yi J, Xiang M, Zhao H, Zhang Y. Identification of common signatures in idiopathic pulmonary fibrosis and lung cancer using gene expression modeling. BMC Cancer 2020; 20:986. [PMID: 33046043 PMCID: PMC7552373 DOI: 10.1186/s12885-020-07494-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 10/05/2020] [Indexed: 12/24/2022] Open
Abstract
Background Idiopathic pulmonary fibrosis (IPF) is associated with an increased risk for lung cancer, but the underlying mechanisms driving malignant transformation remain largely unknown. This study aimed to identify differentially expressed genes (DEGs) distinguishing IPF and lung cancer from healthy individuals and common genes driving the transformation from healthy to IPF and lung cancer. Methods The gene expression data for IPF and non-small cell lung cancer (NSCLC) were retrieved from the Gene Expression Omnibus (GEO) database. The DEG signatures were identified via unsupervised two-way clustering (TWC) analysis, supervised support vector machine analysis, dimensional reduction, and mutual exclusivity analysis. Gene enrichment and pathway analyses were performed to identify common signaling pathways. The most significant signature genes in common among IPF and lung cancer were further verified by immunohistochemistry. Results The gene expression data from GSE24206 and GSE18842 were merged into a super array dataset comprising 86 patients with lung disorders (17 IPF and 46 NSCLC) and 51 healthy controls and measuring 23,494 unique genes. Seventy-nine signature DEGs were found among IPF and NSCLC. The peroxisome proliferator-activated receptor (PPAR) signaling pathway was the most enriched pathway associated with lung disorders, and matrix metalloproteinase-1 (MMP-1) in this pathway was mutually exclusive with several genes in IPF and NSCLC. Subsequent immunohistochemical analysis verified enhanced MMP1 expression in NSCLC associated with IPF. Conclusions For the first time, we defined common signature genes for IPF and NSCLC. The mutually exclusive sets of genes were potential drivers for IPF and NSCLC.
Collapse
Affiliation(s)
- Dong Leng
- Clinical Laboratory, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, 100020, China
| | - Jiawen Yi
- Department of Respiratory and Critical Care Medicine, Beijing Chao-Yang Hospital, Capital Medical University, No. 8 Gongti South Road, Beijing, 100020, China
| | - Maodong Xiang
- Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa, 226-8503, Japan
| | - Hongying Zhao
- Department of Pathology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, 100020, China
| | - Yuhui Zhang
- Department of Respiratory and Critical Care Medicine, Beijing Chao-Yang Hospital, Capital Medical University, No. 8 Gongti South Road, Beijing, 100020, China.
| |
Collapse
|
29
|
Zhang S, Shao J, Yu D, Qiu X, Zhang J. MatchMixeR: a cross-platform normalization method for gene expression data integration. Bioinformatics 2020; 36:2486-2491. [PMID: 31904810 DOI: 10.1093/bioinformatics/btz974] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 09/19/2019] [Accepted: 12/31/2019] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Combining gene expression (GE) profiles generated from different platforms enables previously infeasible studies due to sample size limitations. Several cross-platform normalization methods have been developed to remove the systematic differences between platforms, but they may also remove meaningful biological differences among datasets. In this work, we propose a novel approach that removes the platform, not the biological differences. Dubbed as 'MatchMixeR', we model platform differences by a linear mixed effects regression (LMER) model, and estimate them from matched GE profiles of the same cell line or tissue measured on different platforms. The resulting model can then be used to remove platform differences in other datasets. By using LMER, we achieve better bias-variance trade-off in parameter estimation. We also design a computationally efficient algorithm based on the moment method, which is ideal for ultra-high-dimensional LMER analysis. RESULTS Compared with several prominent competing methods, MatchMixeR achieved the highest after-normalization concordance. Subsequent differential expression analyses based on datasets integrated from different platforms showed that using MatchMixeR achieved the best trade-off between true and false discoveries, and this advantage is more apparent in datasets with limited samples or unbalanced group proportions. AVAILABILITY AND IMPLEMENTATION Our method is implemented in a R-package, 'MatchMixeR', freely available at: https://github.com/dy16b/Cross-Platform-Normalization. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Serin Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Jiang Shao
- Gilead Sciences Inc., Foster City, CA 94404, USA
| | - Disa Yu
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14624, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| |
Collapse
|
30
|
Borisov N, Sorokin M, Tkachev V, Garazha A, Buzdin A. Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020; 13:111. [PMID: 32948183 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open
Abstract
Background Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Methods We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. Results We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. Conclusions We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
Collapse
|
31
|
Hendrickx DM, Glaab E. Comparative transcriptome analysis of Parkinson's disease and Hutchinson-Gilford progeria syndrome reveals shared susceptible cellular network processes. BMC Med Genomics 2020; 13:114. [PMID: 32811487 PMCID: PMC7437934 DOI: 10.1186/s12920-020-00761-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/04/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Parkinson's Disease (PD) and Hutchinson-Gilford Progeria Syndrome (HGPS) are two heterogeneous disorders, which both display molecular and clinical alterations associated with the aging process. However, similarities and differences between molecular changes in these two disorders have not yet been investigated systematically at the level of individual biomolecules and shared molecular network alterations. METHODS Here, we perform a comparative meta-analysis and network analysis of human transcriptomics data from case-control studies for both diseases to investigate common susceptibility genes and sub-networks in PD and HGPS. Alzheimer's disease (AD) and primary melanoma (PM) were included as controls to confirm that the identified overlapping susceptibility genes for PD and HGPS are non-generic. RESULTS We find statistically significant, overlapping genes and cellular processes with significant alterations in both diseases. Interestingly, the majority of these shared affected genes display changes with opposite directionality, indicating that shared susceptible cellular processes undergo different mechanistic changes in PD and HGPS. A complementary regulatory network analysis also reveals that the altered genes in PD and HGPS both contain targets controlled by the upstream regulator CDC5L. CONCLUSIONS Overall, our analyses reveal a significant overlap of affected cellular processes and molecular sub-networks in PD and HGPS, including changes in aging-related processes that may reflect key susceptibility factors associated with age-related risk for PD.
Collapse
Affiliation(s)
- Diana M. Hendrickx
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6, avenue du Swing, Belvaux, L- 4367 Luxembourg
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6, avenue du Swing, Belvaux, L- 4367 Luxembourg
| |
Collapse
|
32
|
Serra A, Fratello M, Cattelani L, Liampa I, Melagraki G, Kohonen P, Nymark P, Federico A, Kinaret PAS, Jagiello K, Ha MK, Choi JS, Sanabria N, Gulumian M, Puzyn T, Yoon TH, Sarimveis H, Grafström R, Afantitis A, Greco D. Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment. Nanomaterials (Basel) 2020; 10:E708. [PMID: 32276469 PMCID: PMC7221955 DOI: 10.3390/nano10040708] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 12/30/2022]
Abstract
Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Irene Liampa
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Georgia Melagraki
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Karolina Jagiello
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - My Kieu Ha
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Jang-Sik Choi
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Natasha Sanabria
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
| | - Mary Gulumian
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
- Haematology and Molecular Medicine Department, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Tomasz Puzyn
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Tae-Hyun Yoon
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Roland Grafström
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antreas Afantitis
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
33
|
Almeida PP, Cardoso CP, de Freitas LM. PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression. BMC Cancer 2020; 20:82. [PMID: 32005189 PMCID: PMC6995241 DOI: 10.1186/s12885-020-6533-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 01/13/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Although the pancreatic ductal adenocarcinoma (PDAC) presents high mortality and metastatic potential, there is a lack of effective therapies and a low survival rate for this disease. This PDAC scenario urges new strategies for diagnosis, drug targets, and treatment. METHODS We performed a gene expression microarray meta-analysis of the tumor against normal tissues in order to identify differentially expressed genes (DEG) shared among all datasets, named core-genes (CG). We confirmed the CG protein expression in pancreatic tissue through The Human Protein Atlas. It was selected five genes with the highest area under the curve (AUC) among these proteins with expression confirmed in the tumor group to train an artificial neural network (ANN) to classify samples. RESULTS This microarray included 461 tumor and 187 normal samples. We identified a CG composed of 40 genes, 39 upregulated, and one downregulated. The upregulated CG included proteins and extracellular matrix receptors linked to actin cytoskeleton reorganization. With the Human Protein Atlas, we verified that fourteen genes of the CG are translated, with high or medium expression in most of the pancreatic tumor samples. To train our ANN, we selected the best genes (AHNAK2, KRT19, LAMB3, LAMC2, and S100P) to classify the samples based on AUC using mRNA expression. The network classified tumor samples with an f1-score of 0.83 for the normal samples and 0.88 for the PDAC samples, with an average of 0.86. The PDAC-ANN could classify the test samples with a sensitivity of 87.6 and specificity of 83.1. CONCLUSION The gene expression meta-analysis and confirmation of the protein expression allow us to select five genes highly expressed PDAC samples. We could build a python script to classify the samples based on RNA expression. This software can be useful in the PDAC diagnosis.
Collapse
Affiliation(s)
- Palloma Porto Almeida
- Núcleo de Biointegração, Instituto Multidisciplinar em Saúde, Universidade Federal da Bahia, Vitória da Conquista, Brazil
| | - Cristina Padre Cardoso
- Núcleo de Biointegração, Instituto Multidisciplinar em Saúde, Universidade Federal da Bahia, Vitória da Conquista, Brazil
- Faculdade Santo Agostinho, Vitória da Conquista, Brazil
| | - Leandro Martins de Freitas
- Núcleo de Biointegração, Instituto Multidisciplinar em Saúde, Universidade Federal da Bahia, Vitória da Conquista, Brazil.
| |
Collapse
|
34
|
Abstract
Intracellular molecular pathways (IMPs) control all major events in the living cell. IMPs are considered hotspots in biomedical sciences and thousands of IMPs have been discovered for humans and model organisms. Knowledge of IMPs activation is essential for understanding biological functions and differences between the biological objects at the molecular level. Here we describe the Oncobox system for accurate quantitative scoring activities of up to several thousand molecular pathways based on high throughput molecular data. Although initially designed for gene expression and mainly RNA sequencing data, Oncobox is now also applicable for quantitative proteomics, microRNA and transcription factor binding sites mapping data. The Oncobox system includes modules of gene expression data harmonization, aggregation and comparison and a recursive algorithm for automatic annotation of molecular pathways. The universal rationale of Oncobox enables scoring of signaling, metabolic, cytoskeleton, immunity, DNA repair, and other pathways in a multitude of biological objects. The Oncobox system can be helpful to all those working in the fields of genetics, biochemistry, interactomics, and big data analytics in molecular biomedicine.
Collapse
Affiliation(s)
- Nicolas Borisov
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Omicsway Corp., Walnut, CA, USA
| | - Maxim Sorokin
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Omicsway Corp., Walnut, CA, USA
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Anton Buzdin
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.
- Omicsway Corp., Walnut, CA, USA.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| |
Collapse
|
35
|
Sibai M, Parlayan C, Tuğlu P, Öztürk G, Demircan T. Integrative Analysis of Axolotl Gene Expression Data from Regenerative and Wound Healing Limb Tissues. Sci Rep 2019; 9:20280. [PMID: 31889169 PMCID: PMC6937273 DOI: 10.1038/s41598-019-56829-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 12/09/2019] [Indexed: 01/08/2023] Open
Abstract
Axolotl (Ambystoma mexicanum) is a urodele amphibian endowed with remarkable regenerative capacities manifested in scarless wound healing and restoration of amputated limbs, which makes it a powerful experimental model for regenerative biology and medicine. Previous studies have utilized microarrays and RNA-Seq technologies for detecting differentially expressed (DE) genes in different phases of the axolotl limb regeneration. However, sufficient consistency may be lacking due to statistical limitations arising from intra-laboratory analyses. This study aims to bridge such gaps by performing an integrative analysis of publicly available microarray and RNA-Seq data from axolotl limb samples having comparable study designs using the “merging” method. A total of 351 genes were found DE in regenerative samples compared to the control in data of both technologies, showing an adjusted p-value < 0.01 and log fold change magnitudes >1. Downstream analyses illustrated consistent correlations of the directionality of DE genes within and between data of both technologies, as well as concordance with the literature on regeneration related biological processes. qRT-PCR analysis validated the observed expression level differences of five of the top DE genes. Future studies may benefit from the utilized concept and approach for enhanced statistical power and robust discovery of biomarkers of regeneration.
Collapse
Affiliation(s)
- Mustafa Sibai
- Graduate School of Engineering and Natural Sciences, Istanbul Medipol University, Istanbul, Turkey
| | - Cüneyd Parlayan
- Regenerative and Restorative Medicine Research Center, REMER, Istanbul Medipol University, Istanbul, Turkey. .,Department of Biomedical Engineering, Faculty of Engineering, İstanbul Medipol University, Istanbul, Turkey.
| | - Pelin Tuğlu
- Regenerative and Restorative Medicine Research Center, REMER, Istanbul Medipol University, Istanbul, Turkey
| | - Gürkan Öztürk
- Regenerative and Restorative Medicine Research Center, REMER, Istanbul Medipol University, Istanbul, Turkey.,Department of Physiology, International School of Medicine, İstanbul Medipol University, Istanbul, Turkey
| | - Turan Demircan
- Regenerative and Restorative Medicine Research Center, REMER, Istanbul Medipol University, Istanbul, Turkey. .,Department of Medical Biology, School of Medicine, Mugla Sitki Kocman University, Mugla, Turkey.
| |
Collapse
|
36
|
Alaimo S, Di Maria A, Shasha D, Ferro A, Pulvirenti A. TACITuS: transcriptomic data collector, integrator, and selector on big data platform. BMC Bioinformatics 2019; 20:366. [PMID: 31757212 PMCID: PMC6873396 DOI: 10.1186/s12859-019-2912-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 05/21/2019] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Several large public repositories of microarray datasets and RNA-seq data are available. Two prominent examples include ArrayExpress and NCBI GEO. Unfortunately, there is no easy way to import and manipulate data from such resources, because the data is stored in large files, requiring large bandwidth to download and special purpose data manipulation tools to extract subsets relevant for the specific analysis. RESULTS TACITuS is a web-based system that supports rapid query access to high-throughput microarray and NGS repositories. The system is equipped with modules capable of managing large files, storing them in a cloud environment and extracting subsets of data in an easy and efficient way. The system also supports the ability to import data into Galaxy for further analysis. CONCLUSIONS TACITuS automates most of the pre-processing needed to analyze high-throughput microarray and NGS data from large publicly-available repositories. The system implements several modules to manage large files in an easy and efficient way. Furthermore, it is capable deal with Galaxy environment allowing users to analyze data through a user-friendly interface.
Collapse
Affiliation(s)
- Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dipartimento di Matematica e Informatica, Viale A. Doria 6, Catania, 95125, Italy.
| | - Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dipartimento di Matematica e Informatica, Viale A. Doria 6, Catania, 95125, Italy.,Department of Physics and Astronomy, University of Catania, Viale A. Doria 6, Catania, 95125, Italy
| | - Dennis Shasha
- Courant Institute of Mathematical Science, New York University, 251 Mercer St, New York, 10012, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dipartimento di Matematica e Informatica, Viale A. Doria 6, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dipartimento di Matematica e Informatica, Viale A. Doria 6, Catania, 95125, Italy
| |
Collapse
|
37
|
Nazarov PV, Wienecke-Baldacchino AK, Zinovyev A, Czerwińska U, Muller A, Nashan D, Dittmar G, Azuaje F, Kreis S. Deconvolution of transcriptomes and miRNomes by independent component analysis provides insights into biological processes and clinical outcomes of melanoma patients. BMC Med Genomics 2019; 12:132. [PMID: 31533822 PMCID: PMC6751789 DOI: 10.1186/s12920-019-0578-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 09/05/2019] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND The amount of publicly available cancer-related "omics" data is constantly growing and can potentially be used to gain insights into the tumour biology of new cancer patients, their diagnosis and suitable treatment options. However, the integration of different datasets is not straightforward and requires specialized approaches to deal with heterogeneity at technical and biological levels. METHODS Here we present a method that can overcome technical biases, predict clinically relevant outcomes and identify tumour-related biological processes in patients using previously collected large discovery datasets. The approach is based on independent component analysis (ICA) - an unsupervised method of signal deconvolution. We developed parallel consensus ICA that robustly decomposes transcriptomics datasets into expression profiles with minimal mutual dependency. RESULTS By applying the method to a small cohort of primary melanoma and control samples combined with a large discovery melanoma dataset, we demonstrate that our method distinguishes cell-type specific signals from technical biases and allows to predict clinically relevant patient characteristics. We showed the potential of the method to predict cancer subtypes and estimate the activity of key tumour-related processes such as immune response, angiogenesis and cell proliferation. ICA-based risk score was proposed and its connection to patient survival was validated with an independent cohort of patients. Additionally, through integration of components identified for mRNA and miRNA data, the proposed method helped deducing biological functions of miRNAs, which would otherwise not be possible. CONCLUSIONS We present a method that can be used to map new transcriptomic data from cancer patient samples onto large discovery datasets. The method corrects technical biases, helps characterizing activity of biological processes or cell types in the new samples and provides the prognosis of patient survival.
Collapse
Affiliation(s)
- Petr V. Nazarov
- Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Anke K. Wienecke-Baldacchino
- Life Sciences Research Unit (LSRU), University of Luxembourg, L-4367 Belvaux, Luxembourg
- Epidemiology and Microbial Genomics Unit, Department of Microbiology, Laboratoire National de Santé, Dudelange, Luxembourg
| | - Andrei Zinovyev
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, F-75006 Paris, France
| | - Urszula Czerwińska
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, F-75006 Paris, France
- Centre de Recherches Interdisciplinaires, Université Paris Descartes, Paris, France
| | - Arnaud Muller
- Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | | | - Gunnar Dittmar
- Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Francisco Azuaje
- Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Stephanie Kreis
- Life Sciences Research Unit (LSRU), University of Luxembourg, L-4367 Belvaux, Luxembourg
| |
Collapse
|
38
|
Yang ZY, Liu XY, Shu J, Zhang H, Ren YQ, Xu ZB, Liang Y. Multi-view based integrative analysis of gene expression data for identifying biomarkers. Sci Rep 2019; 9:13504. [PMID: 31534156 PMCID: PMC6751173 DOI: 10.1038/s41598-019-49967-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 08/30/2019] [Indexed: 01/05/2023] Open
Abstract
The widespread applications in microarray technology have produced the vast quantity of publicly available gene expression datasets. However, analysis of gene expression data using biostatistics and machine learning approaches is a challenging task due to (1) high noise; (2) small sample size with high dimensionality; (3) batch effects and (4) low reproducibility of significant biomarkers. These issues reveal the complexity of gene expression data, thus significantly obstructing microarray technology in clinical applications. The integrative analysis offers an opportunity to address these issues and provides a more comprehensive understanding of the biological systems, but current methods have several limitations. This work leverages state of the art machine learning development for multiple gene expression datasets integration, classification and identification of significant biomarkers. We design a novel integrative framework, MVIAm - Multi-View based Integrative Analysis of microarray data for identifying biomarkers. It applies multiple cross-platform normalization methods to aggregate multiple datasets into a multi-view dataset and utilizes a robust learning mechanism Multi-View Self-Paced Learning (MVSPL) for gene selection in cancer classification problems. We demonstrate the capabilities of MVIAm using simulated data and studies of breast cancer and lung cancer, it can be applied flexibly and is an effective tool for facing the four challenges of gene expression data analysis. Our proposed model makes microarray integrative analysis more systematic and expands its range of applications.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Xiao-Ying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, 519090, China
| | - Jun Shu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Hui Zhang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Yan-Qiong Ren
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Zong-Ben Xu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China.
| |
Collapse
|
39
|
Abstract
We introduce and evaluate the oblique random survival forest (ORSF). The ORSF is an ensemble method for right-censored survival data that uses linear combinations of input variables to recursively partition a set of training data. Regularized Cox proportional hazard models are used to identify linear combinations of input variables in each recursive partitioning step. Benchmark results using simulated and real data indicate that the ORSF's predicted risk function has high prognostic value in comparison to random survival forests, conditional inference forests, regression, and boosting. In an application to data from the Jackson Heart Study, we demonstrate variable and partial dependence using the ORSF and highlight characteristics of its 10-year predicted risk function for atherosclerotic cardiovascular disease events (ASCVD; stroke, coronary heart disease). We present visualizations comparing variable and partial effect estimation according to the ORSF, the conditional inference forest, and the Pooled Cohort Risk equations. The obliqueRSF R package, which provides functions to fit the ORSF and create variable and partial dependence plots, is available on the comprehensive R archive network (CRAN).
Collapse
Affiliation(s)
| | | | | | - Mario Sims
- University of Mississippi Medical Center
| | | | - Yuan-I Min
- University of Mississippi Medical Center
| | | | | | | |
Collapse
|
40
|
Buzdin A, Sorokin M, Garazha A, Glusker A, Aleshin A, Poddubskaya E, Sekacheva M, Kim E, Gaifullin N, Giese A, Seryakov A, Rumiantsev P, Moshkovskii S, Moiseev A. RNA sequencing for research and diagnostics in clinical oncology. Semin Cancer Biol 2019; 60:311-323. [PMID: 31412295 DOI: 10.1016/j.semcancer.2019.07.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 07/16/2019] [Indexed: 12/26/2022]
Abstract
Molecular diagnostics is becoming one of the major drivers of personalized oncology. With hundreds of different approved anticancer drugs and regimens of their administration, selecting the proper treatment for a patient is at least nontrivial task. This is especially sound for the cases of recurrent and metastatic cancers where the standard lines of therapy failed. Recent trials demonstrated that mutation assays have a strong limitation in personalized selection of therapeutics, consequently, most of the drugs cannot be ranked and only a small percentage of patients can benefit from the screening. Other approaches are, therefore, needed to address a problem of finding proper targeted therapies. The analysis of RNA expression (transcriptomic) profiles presents a reasonable solution because transcriptomics stands a few steps closer to tumor phenotype than the genome analysis. Several recent studies pioneered using transcriptomics for practical oncology and showed truly encouraging clinical results. The possibility of directly measuring of expression levels of molecular drugs' targets and profiling activation of the relevant molecular pathways enables personalized prioritizing for all types of molecular-targeted therapies. RNA sequencing is the most robust tool for the high throughput quantitative transcriptomics. Its use, potentials, and limitations for the clinical oncology will be reviewed here along with the technical aspects such as optimal types of biosamples, RNA sequencing profile normalization, quality controls and several levels of data analysis.
Collapse
Affiliation(s)
- Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Omicsway Corp., Walnut, CA, USA; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | | | - Alex Aleshin
- Stanford University School of Medicine, Stanford, 94305, CA, USA
| | - Elena Poddubskaya
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia; Vitamed Oncological Clinics, Moscow, Russia
| | - Marina Sekacheva
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Johannes Gutenberg University Mainz, Mainz, Germany
| | - Nurshat Gaifullin
- Lomonosov Moscow State University, Faculty of Medicine, Moscow, Russia
| | | | | | | | - Sergey Moshkovskii
- Institute of Biomedical Chemistry, Moscow, 119121, Russia; Pirogov Russian National Research Medical University (RNRMU), Moscow, 117997, Russia
| | - Alexey Moiseev
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
41
|
Borisov N, Buzdin A. New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets. Front Oncol 2019; 9:658. [PMID: 31380288 PMCID: PMC6650540 DOI: 10.3389/fonc.2019.00658] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 07/05/2019] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nicolas Borisov
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Anton Buzdin
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia.,Department of Genomics and Postgenomic Technologies, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| |
Collapse
|
42
|
Abstract
BACKGROUND Drought is a severe environmental stress. It is estimated that about 50% of the world rice production is affected mainly by drought. Apart from conventional breeding strategies to develop drought-tolerant crops, innovative computational approaches may provide insights into the underlying molecular mechanisms of stress response and identify drought-responsive markers. Here we propose a network-based computational approach involving a meta-analytic study of seven drought-tolerant rice genotypes under drought stress. RESULTS Co-expression networks enable large-scale analysis of gene-pair associations and tightly coupled clusters that may represent coordinated biological processes. Considering differentially expressed genes in the co-expressed modules and supplementing external information such as resistance/tolerance QTLs, transcription factors, network-based topological measures, we identify and prioritize drought-adaptive co-expressed gene modules and potential candidate genes. Using the candidate genes that are well-represented across the datasets as 'seed' genes, two drought-specific protein-protein interaction networks (PPINs) are constructed with up- and down-regulated genes. Cluster analysis of the up-regulated PPIN revealed ABA signalling pathway as a central process in drought response with a probable crosstalk with energy metabolic processes. Tightly coupled gene clusters representing up-regulation of core cellular respiratory processes and enhanced degradation of branched chain amino acids and cell wall metabolism are identified. Cluster analysis of down-regulated PPIN provides a snapshot of major processes associated with photosynthesis, growth, development and protein synthesis, most of which are shut down during drought. Differential regulation of phytohormones, e.g., jasmonic acid, cell wall metabolism, signalling and posttranslational modifications associated with biotic stress are elucidated. Functional characterization of topologically important, drought-responsive uncharacterized genes that may play a role in important processes such as ABA signalling, calcium signalling, photosynthesis and cell wall metabolism is discussed. Further transgenic studies on these genes may help in elucidating their biological role under stress conditions. CONCLUSION Currently, a large number of resources for rice functional genomics exist which are mostly underutilized by the scientific community. In this study, a computational approach integrating information from various resources such as gene co-expression networks, protein-protein interactions and pathway-level information is proposed to provide a systems-level view of complex drought-responsive processes across the drought-tolerant genotypes.
Collapse
Affiliation(s)
- Sanchari Sircar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
43
|
Acevedo A, Berthel A, DuBois D, Almon RR, Jusko WJ, Androulakis IP. Pathway-Based Analysis of the Liver Response to Intravenous Methylprednisolone Administration in Rats: Acute Versus Chronic Dosing. Gene Regul Syst Bio 2019; 13:1177625019840282. [PMID: 31019365 PMCID: PMC6466473 DOI: 10.1177/1177625019840282] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 03/05/2019] [Indexed: 12/25/2022]
Abstract
Pharmacological time-series data, from comparative dosing studies, are critical to characterizing drug effects. Reconciling the data from multiple studies is inevitably difficult; multiple in vivo high-throughput -omics studies are necessary to capture the global and temporal effects of the drug, but these experiments, though analogous, differ in (microarray or other) platforms, time-scales, and dosing regimens and thus cannot be directly combined or compared. This investigation addresses this reconciliation issue with a meta-analysis technique aimed at assessing the intrinsic activity at the pathway level. The purpose of this is to characterize the dosing effects of methylprednisolone (MPL), a widely used anti-inflammatory and immunosuppressive corticosteroid (CS), within the liver. A multivariate decomposition approach is applied to analyze acute and chronic MPL dosing in male adrenalectomized rats and characterize the dosing-dependent differences in the dynamic response of MPL-responsive signaling and metabolic pathways. We demonstrate how to deconstruct signaling and metabolic pathways into their constituent pathway activities, activities which are scored for intrinsic pathway activity. Dosing-induced changes in the dynamics of pathway activities are compared using a model-based assessment of pathway dynamics, extending the principles of pharmacokinetics/pharmacodynamics (PKPD) to describe pathway activities. The model-based approach enabled us to hypothesize on the likely emergence (or disappearance) of indirect dosing-dependent regulatory interactions, pointing to likely mechanistic implications of dosing of MPL transcriptional regulation. Both acute and chronic MPL administration induced a strong core of activity within pathway families including the following: lipid metabolism, amino acid metabolism, carbohydrate metabolism, metabolism of cofactors and vitamins, regulation of essential organelles, and xenobiotic metabolism pathway families. Pathway activities alter between acute and chronic dosing, indicating that MPL response is dosing dependent. Furthermore, because multiple pathway activities are dominant within a single pathway, we observe that pathways cannot be defined by a single response. Instead, pathways are defined by multiple, complex, and temporally related activities corresponding to different subgroups of genes within each pathway.
Collapse
Affiliation(s)
- Alison Acevedo
- Department of Biomedical Engineering,
Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey,
Piscataway, NJ, USA
| | - Ana Berthel
- Department of Biochemistry, Mount
Holyoke College, South Hadley, MA, USA
| | - Debra DuBois
- Department of Pharmaceutical Sciences,
School of Pharmacy and Pharmaceutical Sciences, The State University of New York at
Buffalo, Buffalo, NY, USA
- Department of Biological Sciences, The
State University of New York at Buffalo, Buffalo, NY, USA
| | - Richard R Almon
- Department of Pharmaceutical Sciences,
School of Pharmacy and Pharmaceutical Sciences, The State University of New York at
Buffalo, Buffalo, NY, USA
- Department of Biological Sciences, The
State University of New York at Buffalo, Buffalo, NY, USA
| | - William J Jusko
- Department of Pharmaceutical Sciences,
School of Pharmacy and Pharmaceutical Sciences, The State University of New York at
Buffalo, Buffalo, NY, USA
- Department of Biological Sciences, The
State University of New York at Buffalo, Buffalo, NY, USA
| | - Ioannis P Androulakis
- Department of Biomedical Engineering,
Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey,
Piscataway, NJ, USA
- Department of Chemical and Biochemical
Engineering, Robert Wood Johnson Medical School, Rutgers, The State University of
New Jersey, Piscataway, NJ, USA
- Department of Surgery, Robert Wood
Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ,
USA
| |
Collapse
|
44
|
Tan TZ, Rouanne M, Tan KT, Huang RYJ, Thiery JP. Molecular Subtypes of Urothelial Bladder Cancer: Results from a Meta-cohort Analysis of 2411 Tumors. Eur Urol 2019. [DOI: 10.1016/j.eururo.2018.08.027 [internet]] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
45
|
Borisov N, Shabalina I, Tkachev V, Sorokin M, Garazha A, Pulin A, Eremin II, Buzdin A. Shambhala: a platform-agnostic data harmonizer for gene expression data. BMC Bioinformatics 2019; 20:66. [PMID: 30727942 PMCID: PMC6366102 DOI: 10.1186/s12859-019-2641-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 01/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms. Electronic supplementary material The online version of this article (10.1186/s12859-019-2641-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nicolas Borisov
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia. .,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.
| | - Irina Shabalina
- Faculty of Mathematics and Information Technologies, Petrozavodsk State University, Anokhina str., 20, Petrozavodsk, 185910, Russia
| | - Victor Tkachev
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| | - Andrew Garazha
- Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Laboratory of Bioinformatics, Oncology and Immunology, D. Rogachyov Federal Research Center of Pediatric Hematology, Moscow, 117198, Russia
| | - Andrey Pulin
- Laboratory for Cell Biology and Developmental Pathology, Federal State Institution "Institute of General Pathology and Pathophysiology", FSBSI "IGPP", Moscow, Russia
| | - Ilya I Eremin
- Department for Regenerative Medicine, JSC Generium, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Sechenov University, Moscow, 119991, Russia.,Department of bioinformatics and molecular networks, OmicsWay Corporation, Walnut, CA, USA.,Group for Genomic Regulation of Cell Signaling Systems, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, 117997, Russia
| |
Collapse
|
46
|
Subbannayya T, Leal-Rojas P, Zhavoronkov A, Ozerov IV, Korzinkin M, Babu N, Radhakrishnan A, Chavan S, Raja R, Pinto SM, Patil AH, Barbhuiya MA, Kumar P, Guerrero-Preston R, Navani S, Tiwari PK, Kumar RV, Prasad TSK, Roa JC, Pandey A, Sidransky D, Gowda H, Izumchenko E, Chatterjee A. PIM1 kinase promotes gallbladder cancer cell proliferation via inhibition of proline-rich Akt substrate of 40 kDa (PRAS40). J Cell Commun Signal 2019; 13:163-177. [PMID: 30666556 DOI: 10.1007/s12079-018-00503-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/17/2018] [Indexed: 12/23/2022] Open
Abstract
Gallbladder cancer (GBC) is a rare malignancy, associated with poor disease prognosis with a 5-year survival of only 20%. This has been attributed to late presentation of the disease, lack of early diagnostic markers and limited efficacy of therapeutic interventions. Elucidation of molecular events in GBC can contribute to better management of the disease by aiding in the identification of therapeutic targets. To identify aberrantly activated signaling events in GBC, tandem mass tag-based quantitative phosphoproteomic analysis of five GBC cell lines was carried out. Proline-rich Akt substrate 40 kDa (PRAS40) was one of the proteins found to be hyperphosphorylated in all the invasive GBC cell lines. Tissue microarray-based immunohistochemical labeling of phospho-PRAS40 (T246) revealed moderate to strong staining in 77% of the primary gallbladder adenocarcinoma cases. Regulation of PRAS40 activity by inhibiting its upstream kinase PIM1 resulted in a significant decrease in cell proliferation, colony forming and invasive ability of GBC cells. Our results support the role of PRAS40 phosphorylation in GBC cell survival and aggressiveness. This study also elucidates phospho-PRAS40 as a clinical marker in GBC and the role of PIM1 as a therapeutic target in GBC.
Collapse
Affiliation(s)
- Tejaswini Subbannayya
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India
| | - Pamela Leal-Rojas
- Center of Excellence in Translational Medicine (CEMT) &Scientific and Technological Bioresource Nucleus (BIOREN), Universidad de La Frontera, Temuco, Chile.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Alex Zhavoronkov
- Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, 21218, USA
| | - Ivan V Ozerov
- Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, 21218, USA
| | - Mikhail Korzinkin
- Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, MD, 21218, USA
| | - Niraj Babu
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India.,Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Aneesha Radhakrishnan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India
| | - Sandip Chavan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India
| | - Remya Raja
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India
| | - Sneha M Pinto
- Center for Systems Biology and Molecular Medicine, Yenepoya (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Arun H Patil
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India.,School of Biotechnology, KIIT (Deemed to be University), Bhubaneswar, Odisha, 751024, India
| | - Mustafa A Barbhuiya
- Department of Pathology and Laboratory Medicine, Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Prashant Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India
| | - Rafael Guerrero-Preston
- Department of Otolaryngology, Head and Neck Surgery, The Johns Hopkins University School of Medicine, 1550 Orleans Street, CRB II, 5M05C, Baltimore, MD, 21231, USA
| | | | - Pramod K Tiwari
- Centre for Genomics, Molecular and Human Genetics, Jiwaji University, Gwalior, 474011, India.,School of Studies in Zoology, Jiwaji University, Gwalior, 474011, India
| | - Rekha Vijay Kumar
- Department of Pathology, Kidwai Memorial Institute of Oncology, Bangalore, Karnataka, 560029, India
| | - T S Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India.,Center for Systems Biology and Molecular Medicine, Yenepoya (Deemed to be University), Mangalore, Karnataka, 575018, India.,NIMHANS-IOB Proteomics and Bioinformatics Laboratory, Neurobiology Research Centre, National Institute of Mental Health and Neurosciences, Bangalore, Karnataka, 560029, India
| | - Juan Carlos Roa
- Department of Pathology, Millenium Institute on Immunology and Immunotherapy (IMII), Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - David Sidransky
- Department of Otolaryngology, Head and Neck Surgery, The Johns Hopkins University School of Medicine, 1550 Orleans Street, CRB II, 5M05C, Baltimore, MD, 21231, USA
| | - Harsha Gowda
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India.,Center for Systems Biology and Molecular Medicine, Yenepoya (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Evgeny Izumchenko
- Department of Otolaryngology, Head and Neck Surgery, The Johns Hopkins University School of Medicine, 1550 Orleans Street, CRB II, 5M05C, Baltimore, MD, 21231, USA.
| | - Aditi Chatterjee
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, 560066, India. .,Center for Systems Biology and Molecular Medicine, Yenepoya (Deemed to be University), Mangalore, Karnataka, 575018, India.
| |
Collapse
|
47
|
Long NP, Park S, Anh NH, Nghi TD, Yoon SJ, Park JH, Lim J, Kwon SW. High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer. Int J Mol Sci 2019; 20:E296. [PMID: 30642095 PMCID: PMC6358915 DOI: 10.3390/ijms20020296] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 12/31/2018] [Accepted: 01/04/2019] [Indexed: 02/07/2023] Open
Abstract
The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.
Collapse
Affiliation(s)
- Nguyen Phuoc Long
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Seongoh Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Nguyen Hoang Anh
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Tran Diem Nghi
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam.
| | - Sang Jun Yoon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Jeong Hill Park
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Sung Won Kwon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| |
Collapse
|
48
|
Long NP, Park S, Anh NH, Min JE, Yoon SJ, Kim HM, Nghi TD, Lim DK, Park JH, Lim J, Kwon SW. Efficacy of Integrating a Novel 16-Gene Biomarker Panel and Intelligence Classifiers for Differential Diagnosis of Rheumatoid Arthritis and Osteoarthritis. J Clin Med 2019; 8:E50. [PMID: 30621359 PMCID: PMC6352223 DOI: 10.3390/jcm8010050] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 12/20/2018] [Accepted: 01/02/2019] [Indexed: 12/15/2022] Open
Abstract
Introducing novel biomarkers for accurately detecting and differentiating rheumatoid arthritis (RA) and osteoarthritis (OA) using clinical samples is essential. In the current study, we searched for a novel data-driven gene signature of synovial tissues to differentiate RA from OA patients. Fifty-three RA, 41 OA, and 25 normal microarray-based transcriptome samples were utilized. The area under the curve random forests (RF) variable importance measurement was applied to seek the most influential differential genes between RA and OA. Five algorithms including RF, k-nearest neighbors (kNN), support vector machines (SVM), naïve-Bayes, and a tree-based method were employed for the classification. We found a 16-gene signature that could effectively differentiate RA from OA, including TMOD1, POP7, SGCA, KLRD1, ALOX5, RAB22A, ANK3, PTPN3, GZMK, CLU, GZMB, FBXL7, TNFRSF4, IL32, MXRA7, and CD8A. The externally validated accuracy of the RF model was 0.96 (sensitivity = 1.00, specificity = 0.90). Likewise, the accuracy of kNN, SVM, naïve-Bayes, and decision tree was 0.96, 0.96, 0.96, and 0.91, respectively. Functional meta-analysis exhibited the differential pathological processes of RA and OA; suggested promising targets for further mechanistic and therapeutic studies. In conclusion, the proposed genetic signature combined with sophisticated classification methods may improve the diagnosis and management of RA patients.
Collapse
Affiliation(s)
- Nguyen Phuoc Long
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Seongoh Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Nguyen Hoang Anh
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Jung Eun Min
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Sang Jun Yoon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Hyung Min Kim
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Tran Diem Nghi
- School of Medicine, Vietnam National University, Ho Chi Minh 700000, Vietnam.
| | - Dong Kyu Lim
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Jeong Hill Park
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Sung Won Kwon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| |
Collapse
|
49
|
Abstract
Background Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets. Methods In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem. Results We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified. Conclusion A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy. Electronic supplementary material The online version of this article (10.1186/s12918-018-0657-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ke Zhang
- School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China
| | - Wei Geng
- School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China
| | - Shuqin Zhang
- Center for Computational Systems Biology, Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.
| |
Collapse
|
50
|
Hollern DP, Contreras CM, Dance-Barnes S, Silva GO, Pfefferle AD, Xiong J, Darr DB, Usary J, Mott KR, Perou CM. A mouse model featuring tissue-specific deletion of p53 and Brca1 gives rise to mammary tumors with genomic and transcriptomic similarities to human basal-like breast cancer. Breast Cancer Res Treat 2018; 174:143-155. [PMID: 30484104 PMCID: PMC6418066 DOI: 10.1007/s10549-018-5061-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 11/16/2018] [Indexed: 12/20/2022]
Abstract
Purpose and methods In human basal-like breast cancer, mutations and deletions in TP53 and BRCA1 are frequent oncogenic events. Thus, we interbred mice expressing the CRE-recombinase with mice harboring loxP sites at TP53 and BRCA1 (K14-Cre; p53f/f Brca1f/f) to test the hypothesis that tissue-specific deletion of TP53 and BRCA1 would give rise to tumors reflective of human basal-like breast cancer. Results In support of our hypothesis, these transgenic mice developed tumors that express basal-like cytokeratins and demonstrated intrinsic gene expression features similar to human basal-like tumors. Array comparative genomic hybridization revealed a striking conservation of copy number alterations between the K14-Cre; p53f/f Brca1f/f mouse model and human basal-like breast cancer. Conserved events included MYC amplification, KRAS amplification, and RB1 loss. Microarray analysis demonstrated that these DNA copy number events also led to corresponding changes in signatures of pathway activation including high proliferation due to RB1 loss. K14-Cre; p53f/f Brca1f/f also matched human basal-like breast cancer for a propensity to have immune cell infiltrates. Given the long latency of K14-Cre; p53f/f Brca1f/f tumors (~ 250 days), we created tumor syngeneic transplant lines, as well as in vitro cell lines, which were tested for sensitivity to carboplatin and paclitaxel. These therapies invoked acute regression, extended overall survival, and resulted in gene expression signatures of an anti-tumor immune response. Conclusion These findings demonstrate that this model is a valuable preclinical resource for the study of human basal-like breast cancer. Electronic supplementary material The online version of this article (10.1007/s10549-018-5061-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel P Hollern
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Cristina M Contreras
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Stephanie Dance-Barnes
- Department of Biological Sciences, Winston Salem State University, Winston-Salem, NC, 27110, USA
| | - Grace O Silva
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Adam D Pfefferle
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Jessie Xiong
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - David B Darr
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA
| | - Jerry Usary
- Arrow Genomics LLC, Chapel Hill, NC, 27517, USA
| | - Kevin R Mott
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, CB#7264, Chapel Hill, NC, 27599, USA. .,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| |
Collapse
|