1
|
Cross-Platform Omics Prediction procedure: a statistical machine learning framework for wider implementation of precision medicine. NPJ Digit Med 2022; 5:85. [PMID: 35788693 PMCID: PMC9253123 DOI: 10.1038/s41746-022-00618-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 05/19/2022] [Indexed: 11/17/2022] Open
Abstract
In this modern era of precision medicine, molecular signatures identified from advanced omics technologies hold great promise to better guide clinical decisions. However, current approaches are often location-specific due to the inherent differences between platforms and across multiple centres, thus limiting the transferability of molecular signatures. We present Cross-Platform Omics Prediction (CPOP), a penalised regression model that can use omics data to predict patient outcomes in a platform-independent manner and across time and experiments. CPOP improves on the traditional prediction framework of using gene-based features by selecting ratio-based features with similar estimated effect sizes. These components gave CPOP the ability to have a stable performance across datasets of similar biology, minimising the effect of technical noise often generated by omics platforms. We present a comprehensive evaluation using melanoma transcriptomics data to demonstrate its potential to be used as a critical part of a clinical screening framework for precision medicine. Additional assessment of generalisation was demonstrated with ovarian cancer and inflammatory bowel disease studies.
Collapse
|
2
|
Li Q, Lei S, Luo X, He J, Fang Y, Yang H, Liu Y, Deng CY, Wu S, Xue YM, Rao F. Construction of Prediction Model for Atrial Fibrillation with Valvular Heart Disease Based on Machine Learning. Rev Cardiovasc Med 2022; 23:247. [PMID: 39076905 PMCID: PMC11266776 DOI: 10.31083/j.rcm2307247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/31/2022] [Accepted: 06/10/2022] [Indexed: 07/31/2024] Open
Abstract
Background Valvular heart disease (VHD) is a major precipitating factor of atrial fibrillation (AF) that contributes to decreased cardiac function, heart failure, and stroke. Stroke induced by VHD combined with atrial fibrillation (AF-VHD) is a much more serious condition in comparison to VHD alone. The aim of this study was to explore the molecular mechanism governing VHD progression and to provide candidate treatment targets for AF-VHD. Methods Four public mRNA microarray datasets were downloaded and differentially expressed genes (DEGs) screening was performed. Weighted gene correlation network analysis was carried out to detect key modules and explore their relationships and disease status. Candidate hub signature genes were then screened within the key module using machine learning methods. The receiver operating characteristic curve and nomogram model analysis were used to determine the potential clinical significance of the hub genes. Subsequently, target gene protein levels in independent human atrial tissue samples were detected using western blotting. Specific expression analysis of the hub genes in the tissue and cell samples was performed using single-cell sequencing analysis in the Human Protein Atlas tool. Results A total of 819 common DEGs in combined datasets were screened. Fourteen modules were identified using the cut tree dynamic function. The cyan and purple modules were considered the most clinically significant for AF-VHD. Then, 25 hub genes in the cyan and purple modules were selected for further analysis. The pathways related to dilated cardiomyopathy, hypertrophic cardiomyopathy, and heart contraction were concentrated in the purple and cyan modules of the AF-VHD. Genes of importance (CSRP3, MCOLN3, SLC25A5, and FIBP) were then identified based on machine learning. Of these, CSRP3 had a potential clinical significance and was specifically expressed in the heart tissue. Conclusions The identified genes may play critical roles in the pathophysiological process of AF-VHD, providing new insights into VHD development to AF and helping to determine potential biomarkers and therapeutic targets for treating AF-VHD.
Collapse
Affiliation(s)
- Qiaoqiao Li
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Shenghong Lei
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Xueshan Luo
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Jintao He
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Yuan Fang
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Hui Yang
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Yang Liu
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Chun-Yu Deng
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Shulin Wu
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Yu-Mei Xue
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| | - Fang Rao
- Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
- Research Center of Medical Sciences, Provincial Key Laboratory of Clinical Pharmacology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, 510080 Guangzhou, Guangdong, China
| |
Collapse
|
3
|
Watson AW, Grant AD, Parker SS, Hill S, Whalen MB, Chakrabarti J, Harman MW, Roman MR, Forte BL, Gowan CC, Castro-Portuguez R, Stolze LK, Franck C, Cusanovich DA, Zavros Y, Padi M, Romanoski CE, Mouneimne G. Breast tumor stiffness instructs bone metastasis via maintenance of mechanical conditioning. Cell Rep 2021; 35:109293. [PMID: 34192535 PMCID: PMC8312405 DOI: 10.1016/j.celrep.2021.109293] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/26/2021] [Accepted: 06/03/2021] [Indexed: 11/14/2022] Open
Abstract
While the immediate and transitory response of breast cancer cells to pathological stiffness in their native microenvironment has been well explored, it remains unclear how stiffness-induced phenotypes are maintained over time after cancer cell dissemination in vivo. Here, we show that fibrotic-like matrix stiffness promotes distinct metastatic phenotypes in cancer cells, which are preserved after transition to softer microenvironments, such as bone marrow. Using differential gene expression analysis of stiffness-responsive breast cancer cells, we establish a multigenic score of mechanical conditioning (MeCo) and find that it is associated with bone metastasis in patients with breast cancer. The maintenance of mechanical conditioning is regulated by RUNX2, an osteogenic transcription factor, established driver of bone metastasis, and mitotic bookmarker that preserves chromatin accessibility at target gene loci. Using genetic and functional approaches, we demonstrate that mechanical conditioning maintenance can be simulated, repressed, or extended, with corresponding changes in bone metastatic potential. Watson et al. demonstrate that mechanical conditioning by stiff microenvironments in breast tumors is maintained in cancer cells after dissemination to softer microenvironments, including bone marrow. They show that mechanical conditioning promotes invasion and osteolysis and establish a mechanical conditioning (MeCo) score, associated with bone metastasis in patients.
Collapse
Affiliation(s)
- Adam W Watson
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; MeCo Diagnostics, Tucson, AZ 85718, USA
| | - Adam D Grant
- University of Arizona Cancer Center, Tucson, AZ 85724, USA
| | - Sara S Parker
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Samantha Hill
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Michael B Whalen
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Jayati Chakrabarti
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Michael W Harman
- School of Engineering, Brown University, Providence, RI 02912, USA
| | | | | | - Cody C Gowan
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | | | - Lindsey K Stolze
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Christian Franck
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Darren A Cusanovich
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Yana Zavros
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA
| | - Megha Padi
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA; Bioinformatics Shared Resource, University of Arizona Cancer Center, Tucson, AZ 85724, USA
| | - Casey E Romanoski
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA.
| | - Ghassan Mouneimne
- University of Arizona Cancer Center, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ 85724, USA.
| |
Collapse
|
4
|
Chen T, Tyagi S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 2020; 9:giaa064. [PMID: 32543653 PMCID: PMC7297091 DOI: 10.1093/gigascience/giaa064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/25/2020] [Accepted: 05/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. RESULTS In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. CONCLUSIONS A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Collapse
Affiliation(s)
- Tyrone Chen
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Sonika Tyagi
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
5
|
Altenbuchinger M, Weihs A, Quackenbush J, Grabe HJ, Zacharias HU. Gaussian and Mixed Graphical Models as (multi-)omics data analysis tools. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194418. [PMID: 31639475 PMCID: PMC7166149 DOI: 10.1016/j.bbagrm.2019.194418] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/21/2019] [Accepted: 08/21/2019] [Indexed: 11/30/2022]
Abstract
Gaussian Graphical Models (GGMs) are tools to infer dependencies between biological variables. Popular applications are the reconstruction of gene, protein, and metabolite association networks. GGMs are an exploratory research tool that can be useful to discover interesting relations between genes (functional clusters) or to identify therapeutically interesting genes, but do not necessarily infer a network in the mechanistic sense. Although GGMs are well investigated from a theoretical and applied perspective, important extensions are not well known within the biological community. GGMs assume, for instance, multivariate normal distributed data. If this assumption is violated Mixed Graphical Models (MGMs) can be the better choice. In this review, we provide the theoretical foundations of GGMs, present extensions such as MGMs or multi-class GGMs, and illustrate how those methods can provide insight in biological mechanisms. We summarize several applications and present user-friendly estimation software. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michael Altenbuchinger
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA.
| | - Antoine Weihs
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Hans Jörgen Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany; German Center for Neurodegenerative Diseases DZNE, Site Rostock/Greifswald, 17475 Greifswald, Germany
| | - Helena U Zacharias
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany.
| |
Collapse
|
6
|
Platform independent protein-based cell-of-origin subtyping of diffuse large B-cell lymphoma in formalin-fixed paraffin-embedded tissue. Sci Rep 2020; 10:7876. [PMID: 32398793 PMCID: PMC7217957 DOI: 10.1038/s41598-020-64212-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 04/09/2020] [Indexed: 01/03/2023] Open
Abstract
Diffuse large B-cell lymphoma (DLBCL) is commonly classified by gene expression profiling according to its cell of origin (COO) into activated B-cell (ABC)-like and germinal center B-cell (GCB)-like subgroups. Here we report the application of label-free nano-liquid chromatography - Sequential Window Acquisition of all THeoretical fragment-ion spectra - mass spectrometry (nanoLC-SWATH-MS) to the COO classification of DLBCL in formalin-fixed paraffin-embedded (FFPE) tissue. To generate a protein signature capable of predicting Affymetrix-based GCB scores, the summed log2-transformed fragment ion intensities of 780 proteins quantified in a training set of 42 DLBCL cases were used as independent variables in a penalized zero-sum elastic net regression model with variable selection. The eight-protein signature obtained showed an excellent correlation (r = 0.873) between predicted and true GCB scores and yielded only 9 (21.4%) minor discrepancies between the three classifications: ABC, GCB, and unclassified. The robustness of the model was validated successfully in two independent cohorts of 42 and 31 DLBCL cases, the latter cohort comprising only patients aged >75 years, with Pearson correlation coefficients of 0.846 and 0.815, respectively, between predicted and NanoString nCounter based GCB scores. We further show that the 8-protein signature is directly transferable to both a triple quadrupole and a Q Exactive quadrupole-Orbitrap mass spectrometer, thus obviating the need for proprietary instrumentation and reagents. This method may therefore be used for robust and competitive classification of DLBCLs on the protein level.
Collapse
|
7
|
Comparison of GeneChip, nCounter, and Real-Time PCR-Based Gene Expressions Predicting Locoregional Tumor Control after Primary and Postoperative Radiochemotherapy in Head and Neck Squamous Cell Carcinoma. J Mol Diagn 2020; 22:801-810. [PMID: 32247864 DOI: 10.1016/j.jmoldx.2020.03.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 02/21/2020] [Accepted: 03/10/2020] [Indexed: 02/07/2023] Open
Abstract
This article compares the expression and applicability of biomarkers, from single genes and gene signatures, identified in patients with locally advanced head and neck squamous cell carcinoma using the GeneChip Human Transcriptome Array 2.0, nCounter, and real-time PCR analyses. Two multicenter, retrospective cohorts of patients with head and neck squamous cell carcinoma from the German Cancer Consortium Radiation Oncology Group who received postoperative radiochemotherapy or primary radiochemotherapy were considered. Real-time PCR was performed for a limited number of 38 genes of the cohort who received postoperative radiochemotherapy only. Correlations between the methods were evaluated by the Spearman rank correlation coefficient. Patients were stratified based on the expression of putative cancer stem cell markers, hypoxia-associated gene signatures, and a previously developed seven-gene signature. Locoregional tumor control was compared between these patient subgroups using log-rank tests. Gene expressions obtained from nCounter analyses were moderately correlated to GeneChip analyses (median ρ = approximately 0.68). A higher correlation was obtained between nCounter analyses and real-time PCR (median ρ = 0.84). Significant associations with locoregional tumor control were observed for most of the considered biomarkers evaluated by GeneChip and nCounter analyses. In general, all applied biomarkers (single genes and gene signatures) classified approximately 70% to 85% of the patients similarly. Overall, gene signatures seem to be more robust and had a better transferability among different measurement methods.
Collapse
|
8
|
Lausser L, Szekely R, Klimmek A, Schmid F, Kestler HA. Constraining classifiers in molecular analysis: invariance and robustness. J R Soc Interface 2020; 17:20190612. [PMID: 32019472 PMCID: PMC7061712 DOI: 10.1098/rsif.2019.0612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/09/2020] [Indexed: 12/02/2022] Open
Abstract
Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather than by sophisticated considerations on the properties of a categorization model. Here, we derive and report four linked linear concept classes/models with distinct invariance properties for high-dimensional molecular classification. We can further show that these concept classes also form a half-order of complexity classes in terms of Vapnik-Chervonenkis dimensions, which also implies increased generalization abilities. We implemented support vector machines with these properties. Surprisingly, we were able to attain comparable or even superior generalization abilities to the standard linear one on the 27 investigated RNA-Seq and microarray datasets. Our results indicate that a priori chosen invariant models can replace ad hoc robustness analysis by interpretable and theoretically guaranteed properties in molecular categorization.
Collapse
Affiliation(s)
- Ludwig Lausser
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Robin Szekely
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Attila Klimmek
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Florian Schmid
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Hans A. Kestler
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
- Leibniz Institute on Aging, Jena, Germany
| |
Collapse
|
9
|
Expression Concordance of 325 Novel RNA Biomarkers between Data Generated by NanoString nCounter and Affymetrix GeneChip. DISEASE MARKERS 2019; 2019:1940347. [PMID: 31217830 PMCID: PMC6536986 DOI: 10.1155/2019/1940347] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 02/09/2019] [Accepted: 02/15/2019] [Indexed: 02/06/2023]
Abstract
Background With the development of new drug combinations and targeted treatments for multiple types of cancer, the ability to stratify categories of patient populations and to develop companion diagnostics has become increasingly important. A panel of 325 RNA biomarkers was selected based on cancer-related biological processes of healthy cells and gene expression changes over time during nonmalignant epithelial cell organization. This "cancer in reverse" approach resulted in a panel of biomarkers relevant for at least 7 cancer types, providing gene expression profiles representing key cellular signaling pathways beyond mutations in "driver genes." Objective. To further investigate this biomarker panel, the objective of the current study is to (1) validate the assay reproducibility for the 325 RNA biomarkers and (2) compare gene expression profiles side by side using two technology platforms. Methods and Results We have mapped the 325 RNA transcripts and in a custom NanoString nCounter expression panel to be compared to all potential probe sets in the Affymetrix Human Genome U133 Plus 2.0. The experiments were conducted with 10 unique biological formalin-fixed paraffin-embedded (FFPE) breast tumor samples. Each site extracted RNA from four sections of 10-micron thick FFPE tissue over three different days by two different operators using an optimized standard operating procedure and quality control criteria. Samples were analyzed using mas5 in BioConductor and NanoStringNorm in R. Pearson correlation showed reproducibility between sites for all 60 samples with r = 0.995 for Affymetrix and r = 0.999 for NanoString. Correlation in multiple days and multiple users was for Affymetrix r = (0.962 - 0.999) and for NanoString r = (0.982 - 0.991). Conclusion The 325 RNA biomarkers showed reproducibility in two technology platforms with moderate to high concordance. Future directions include performing clinical validation studies and generating rationale for patient selection in clinical trials using the technically validated assay.
Collapse
|
10
|
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. From hype to reality: data science enabling personalized medicine. BMC Med 2018; 16:150. [PMID: 30145981 PMCID: PMC6109989 DOI: 10.1186/s12916-018-1122-7] [Citation(s) in RCA: 208] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
Collapse
Affiliation(s)
- Holger Fröhlich
- UCB Biosciences GmbH, Alfred-Nobel-Str. Str. 10, 40789 Monheim, Germany
- University of Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19c, 53115 Bonn, Germany
| | - Rudi Balling
- University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg
| | - Niko Beerenwinkel
- Department of Biosciences and Engineering, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| | - Oliver Kohlbacher
- University of Tübingen, WSI/ZBIT, Sand 14, 72076 Tübingen, Germany
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
- Institute for Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Santosh Kumar
- Department of Computer Science, University of Memphis, 2222 Dunn Hall, Memphis, TN 38152 USA
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marloes H. Maathuis
- ETH Zurich, Seminar für Statistik, Rämistrasse 101, 8092 Zurich, Switzerland
| | - Yves Moreau
- University of Leuven, ESAT, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Susan A. Murphy
- Harvard University, Science Center 400 Suite, Oxford Street, Cambridge, MA 02138-2901 USA
| | - Teresa M. Przytycka
- National Center of Biotechnology Information, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894-6075 USA
| | - Michael Rebhan
- Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Hannes Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON M5S 3E1 Canada
| | - Andreas Schuppert
- RWTH Aachen, Joint Research Center for Computational Biomedicine, Pauwelsstrasse 19, 52074 Aachen, Germany
| | - Matthias Schwab
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Aucherbachstrasse 112, 70376 Stuttgart, Germany
- University of Tübingen, Departments of Clinical Pharmacology and of Pharmacy and Biochemistry, Tübingen, Germany
| | - Rainer Spang
- University of Regensburg, Institute of Functional Genomics, Am BioPark 9, 93053 Regensburg, Germany
| | - Daniel Stekhoven
- ETH Zurich, NEXUS Personalized Health Technol., Otto-Stern-Weg 7, 8093 Zurich, Switzerland
| | - Jimeng Sun
- Georgia Tech University, 801 Atlantic Drive, Atlanta, GA 30332-0280 USA
| | - Andreas Weber
- Institute for Computer Science, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany
| | - Daniel Ziemek
- Pfizer, Worldwide Research and Development, Linkstraße 10, 10785 Berlin, Germany
| | - Blaz Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
11
|
Zacharias HU, Rehberg T, Mehrl S, Richtmann D, Wettig T, Oefner PJ, Spang R, Gronwald W, Altenbuchinger M. Scale-Invariant Biomarker Discovery in Urine and Plasma Metabolite Fingerprints. J Proteome Res 2017; 16:3596-3605. [PMID: 28825821 DOI: 10.1021/acs.jproteome.7b00325] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such scaling of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways. Here, we studied how both the outcome of hypothesis tests for differential metabolite concentration and the screening for multivariate metabolite signatures are affected by the choice of scale. To overcome this problem for metabolite signatures and to establish a scale-invariant biomarker discovery algorithm, we extended linear zero-sum regression to the logistic regression framework and showed in two applications to 1H NMR-based metabolomics data how this approach overcomes the scaling problem. Logistic zero-sum regression is available as an R package as well as a high-performance computing implementation that can be downloaded at https://github.com/rehbergT/zeroSum .
Collapse
Affiliation(s)
| | | | | | - Daniel Richtmann
- Department of Physics, University of Regensburg , Universitätsstraße 31, 93053 Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg , Universitätsstraße 31, 93053 Regensburg, Germany
| | | | | | | | | |
Collapse
|