1
|
Calafiore GC, Fracastoro G. Sparse ℓ 1- and ℓ 2-Center Classifiers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:996-1009. [PMID: 33226955 DOI: 10.1109/tnnls.2020.3036838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we discuss two novel sparse versions of the classical nearest-centroid classifier. The proposed sparse classifiers are based on l1 and l2 distance criteria, respectively, and perform simultaneous feature selection and classification, by detecting the features that are most relevant for the classification purpose. We formally prove that the training of the proposed sparse models, with both distance criteria, can be performed exactly (i.e., the globally optimal set of features is selected) at a linear computational cost. Especially, the proposed sparse classifiers are trained in O(mn)+O(mlogk) operations, where n is the number of samples, m is the total number of features, and k ≤ m is the number of features to be retained in the classifier. Furthermore, the complexity of testing and classifying a new sample is simply O(k) for both methods. The proposed models can be employed either as stand-alone sparse classifiers or fast feature-selection techniques for prefiltering the features to be later fed to other types of classifiers (e.g., SVMs). The experimental results show that the proposed methods are competitive in accuracy with state-of-the-art feature selection and classification techniques while having a substantially lower computational cost.
Collapse
|
2
|
Johns H, Bernhardt J, Churilov L. Distance-based Classification and Regression Trees for the analysis of complex predictors in health and medical research. Stat Methods Med Res 2021; 30:2085-2104. [PMID: 34319834 DOI: 10.1177/09622802211032712] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Predicting patient outcomes based on patient characteristics and care processes is a common task in medical research. Such predictive features are often multifaceted and complex, and are usually simplified into one or more scalar variables to facilitate statistical analysis. This process, while necessary, results in a loss of important clinical detail. While this loss may be prevented by using distance-based predictive methods which better represent complex healthcare features, the statistical literature on such methods is limited, and the range of tools facilitating distance-based analysis is substantially smaller than those of other methods. Consequently, medical researchers must choose to either reduce complex predictive features to scalar variables to facilitate analysis, or instead use a limited number of distance-based predictive methods which may not fulfil the needs of the analysis problem at hand. We address this limitation by developing a Distance-Based extension of Classification and Regression Trees (DB-CART) capable of making distance-based predictions of categorical, ordinal and numeric patient outcomes. We also demonstrate how this extension is compatible with other extensions to CART, including a recently published method for predicting care trajectories in chronic disease. We demonstrate DB-CART by using it to expand upon previously published dose-response analysis of stroke rehabilitation data. Our method identified additional detail not captured by the previously published analysis, reinforcing previous conclusions. We also demonstrate how by combining DB-CART with other extensions to CART, the method is capable of making predictions about complex, multifaceted outcome data based on complex, multifaceted predictive features.
Collapse
Affiliation(s)
- Hannah Johns
- Center for Research Excellence in Stroke Rehabilitation and Brain Recovery, Heidelberg, VIC, Australia.,Florey Institute of Neuroscience and Mental Health, Heidelberg, VIC, Australia.,Melbourne Medical School, University of Melbourne, Parkville, VIC, Australia
| | - Julie Bernhardt
- Center for Research Excellence in Stroke Rehabilitation and Brain Recovery, Heidelberg, VIC, Australia.,Florey Institute of Neuroscience and Mental Health, Heidelberg, VIC, Australia
| | - Leonid Churilov
- Florey Institute of Neuroscience and Mental Health, Heidelberg, VIC, Australia.,Melbourne Medical School, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
3
|
Ozer ME, Sarica PO, Arga KY. New Machine Learning Applications to Accelerate Personalized Medicine in Breast Cancer: Rise of the Support Vector Machines. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 24:241-246. [PMID: 32228365 DOI: 10.1089/omi.2020.0001] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Artificial intelligence, machine learning, health care robots, and algorithms for clinical decision-making are currently being sought after in diverse fields of clinical medicine and bioengineering. The field of personalized medicine stands to benefit from new technologies so as to harness the omics big data, for example, to individualize and accelerate cancer diagnostics and therapeutics in particular. In this overarching context, breast cancer is one of the most common malignancies worldwide with multiple underlying molecular etiologies and each subtype displaying diverse clinical outcomes. Disease stratification for breast cancer is, therefore, vital to its effective and individualized clinical care. The support vector machine (SVM) is a rising machine learning approach that offers robust classification of high-dimensional big data into small numbers of data points (support vectors), achieving differentiation of subgroups in a short amount of time. Considering the rapid timelines required for both diagnosis and treatment of most aggressive cancers, this new machine learning technique has important clinical and public applications and implications for high-throughput data analysis and contextualization. This expert review describes and examines, first, the SVM models employed to forecast breast cancer subtypes using diverse systems science data, including transcriptomics, epigenetics, proteomics, and radiomics, as well as biological pathway, clinical, pathological, and biochemical data. Then, we compare the performance of the present SVM and other diagnostic and therapeutic prediction models across the data types. We conclude by emphasizing that data integration is a critical bottleneck in systems science, cancer research and development, and health care innovation and that SVM and machine learning approaches offer new solutions and ways forward in biomedical, bioengineering, and clinical applications.
Collapse
Affiliation(s)
- Mustafa Erhan Ozer
- Department of Bioengineering, Faculty of Engineering, Marmara University, Istanbul, Turkey
| | - Pemra Ozbek Sarica
- Department of Bioengineering, Faculty of Engineering, Marmara University, Istanbul, Turkey
| | - Kazim Yalcin Arga
- Department of Bioengineering, Faculty of Engineering, Marmara University, Istanbul, Turkey.,Health Institutes of Turkey, Istanbul, Turkey
| |
Collapse
|
4
|
Integrative analysis of diffusion-weighted MRI and genomic data to inform treatment of glioblastoma. J Neurooncol 2016; 129:289-300. [PMID: 27393347 DOI: 10.1007/s11060-016-2174-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 06/04/2016] [Indexed: 12/15/2022]
Abstract
Gene expression profiling from glioblastoma (GBM) patients enables characterization of cancer into subtypes that can be predictive of response to therapy. An integrative analysis of imaging and gene expression data can potentially be used to obtain novel biomarkers that are closely associated with the genetic subtype and gene signatures and thus provide a noninvasive approach to stratify GBM patients. In this retrospective study, we analyzed the expression of 12,042 genes for 558 patients from The Cancer Genome Atlas (TCGA). Among these patients, 50 patients had magnetic resonance imaging (MRI) studies including diffusion weighted (DW) MRI in The Cancer Imaging Archive (TCIA). We identified the contrast enhancing region of the tumors using the pre- and post-contrast T1-weighted MRI images and computed the apparent diffusion coefficient (ADC) histograms from the DW-MRI images. Using the gene expression data, we classified patients into four molecular subtypes, determined the number and composition of genes modules using the gap statistic, and computed gene signature scores. We used logistic regression to find significant predictors of GBM subtypes. We compared the predictors for different subtypes using Mann-Whitney U tests. We assessed detection power using area under the receiver operating characteristic (ROC) analysis. We computed Spearman correlations to determine the associations between ADC and each of the gene signatures. We performed gene enrichment analysis using Ingenuity Pathway Analysis (IPA). We adjusted all p values using the Benjamini and Hochberg method. The mean ADC was a significant predictor for the neural subtype. Neural tumors had a significantly lower mean ADC compared to non-neural tumors ([Formula: see text]), with mean ADC of [Formula: see text] and [Formula: see text] for neural and non-neural tumors, respectively. Mean ADC showed an area under the ROC of 0.75 for detecting neural tumors. We found eight gene modules in the GBM cohort. The mean ADC was significantly correlated with the gene signature related with dendritic cell maturation ([Formula: see text], [Formula: see text]). Mean ADC could be used as a biomarker of a gene signature associated with dendritic cell maturation and to assist in identifying patients with neural GBMs, known to be resistant to aggressive standard of care.
Collapse
|
5
|
Viviano G, Salerno F, Manfredi EC, Polesello S, Valsecchi S, Tartari G. Surrogate measures for providing high frequency estimates of total phosphorus concentrations in urban watersheds. WATER RESEARCH 2014; 64:265-277. [PMID: 25076012 DOI: 10.1016/j.watres.2014.07.009] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 06/30/2014] [Accepted: 07/03/2014] [Indexed: 06/03/2023]
Abstract
Until robust in situ sensors for total phosphorus (TP) are developed, continuous water quality measurements have the potential to be used as surrogates for generating high frequency estimates. Their use has widespread implications for water quality monitoring programmes considering that TP, in particular, is generally recognised as the limiting factor in the process of eutrophication. Surrogate measures for TP concentration, such as turbidity, have proved useful within natural and agricultural contexts, but their predictive capability for urban watersheds is considered more difficult, due to the different sources of TP, though a strict relationship with turbidity/suspended matter has been clearly described even for these environments. In this context, we investigated this still unresolved problem for high frequency estimation of TP concentration in urban environments by monitoring a medium-sized (71 km(2)) urban watershed (Lambro River watershed, north Italy) in which we detected 60 active combined sewer overflows, and an its natural sub-basin for comparison. We found two different relationships between turbidity and TP concentration in the investigated urban watershed that differently describe the prevalence of TP from point sources (domestic wastewaters) or diffuse origin (surface runoff). In this regard, we first characterise the prevailing sources of TP by using a marker for detecting domestic wastewater contamination (caffeine), then we describe the mutual relationships amongst the continuously monitored variables (in our case the occurrence of the First Flush and the clockwise turbidity/discharge hysteresis). Afterwards we discriminate, by observing variables that are continuously monitored (in our case, the discharge and the turbidity), amongst the continuous surrogate records according to their sources. In conclusion, we are able to apply the relevant turbidity/TP regression equations to each turbidity record and, thus, estimate the respective TP concentrations with high frequency. If traditional grab sampling techniques had been employed, the contributions of point sources (up to 34% across 237 monitored days) to the total estimated loads would not have been correctly evaluated, whilst the high frequency monitoring is able to catch the dynamics that occur over time scales of a few hours. We conclude that the reasonable uncertainty obtained in this study can be achieved in other urban watersheds, but further studies are required for watersheds of differing sizes and degrees of urbanisation.
Collapse
Affiliation(s)
- Gaetano Viviano
- CNR - Water Research Institute (IRSA), Via del Mulino 19, Brugherio (MB) 20861, Italy
| | - Franco Salerno
- CNR - Water Research Institute (IRSA), Via del Mulino 19, Brugherio (MB) 20861, Italy.
| | | | - Stefano Polesello
- CNR - Water Research Institute (IRSA), Via del Mulino 19, Brugherio (MB) 20861, Italy
| | - Sara Valsecchi
- CNR - Water Research Institute (IRSA), Via del Mulino 19, Brugherio (MB) 20861, Italy
| | - Gianni Tartari
- CNR - Water Research Institute (IRSA), Via del Mulino 19, Brugherio (MB) 20861, Italy
| |
Collapse
|
6
|
Hardin J, Garcia SR, Golan D. A method for generating realistic correlation matrices. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas638] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Pang H, Tong T, Ng M. Block-diagonal discriminant analysis and its bias-corrected rules. Stat Appl Genet Mol Biol 2013; 12:347-59. [PMID: 23735433 DOI: 10.1515/sagmb-2012-0017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
High-throughput expression profiling allows simultaneous measure of tens of thousands of genes at once. These data have motivated the development of reliable biomarkers for disease subtypes identification and diagnosis. Many methods have been developed in the literature for analyzing these data, such as diagonal discriminant analysis, support vector machines, and k-nearest neighbor methods. The diagonal discriminant methods have been shown to perform well for high-dimensional data with small sample sizes. Despite its popularity, the independence assumption is unlikely to be true in practice. Recently, a gene module based linear discriminant analysis strategy has been proposed by utilizing the correlation among genes in discriminant analysis. However, the approach can be underpowered when the samples of the two classes are unbalanced. In this paper, we propose to correct the biases in the discriminant scores of block-diagonal discriminant analysis. In simulation studies, our proposed method outperforms other approaches in various settings. We also illustrate our proposed discriminant analysis method for analyzing microarray data studies.
Collapse
Affiliation(s)
- Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
| | | | | |
Collapse
|
8
|
Wang SL, Li XL, Fang J. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification. BMC Bioinformatics 2012; 13:178. [PMID: 22830977 PMCID: PMC3465202 DOI: 10.1186/1471-2105-13-178] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/18/2012] [Indexed: 01/03/2023] Open
Abstract
Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.
Collapse
Affiliation(s)
- Shu-Lin Wang
- Applied Bioinformatics Laboratory, University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA
| | | | | |
Collapse
|
9
|
|
10
|
Zhang JG, Li J, Tang W, Deng HW. Fusing Gene Interaction to Improve Disease Discrimination on Classification Analysis. ADVANCES IN GENETICS 2012; 1:1000102. [PMID: 23814698 DOI: 10.4172/age.1000102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
It is usually observed that among genes there exist strong statistical interactions associated with diseases of public health importance. Gene interactions can potentially contribute to the improvement of disease classification accuracy. Especially when gene expression differs across different classes are not great enough, it is more important to take use of gene interactions for disease classification analyses. However, most gene selection algorithms in classification analyses merely focus on genes whose expression levels show differences across classes, and ignore the discriminatory information from gene interactions. In this study, we develop a two-stage algorithm that can take gene interaction into account during a gene selection procedure. Its biggest advantage is that it can take advantage of discriminatory information from gene interactions as well as gene expression differences, by using "Bayes error" as a gene selection criterion. Using simulated and real microarray data sets, we demonstrate the ability of gene interactions for classification accuracy improvement, and present that the proposed algorithm can yield small informative sets of genes while leading to highly accurate classification results. Thus our study may give a novel sight for future gene selection algorithms of human diseases discrimination.
Collapse
Affiliation(s)
- Ji-Gang Zhang
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, USA
| | | | | | | |
Collapse
|
11
|
Huang S, Tong T, Zhao H. Bias-corrected diagonal discriminant rules for high-dimensional classification. Biometrics 2011; 66:1096-106. [PMID: 20222939 DOI: 10.1111/j.1541-0420.2010.01395.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Diagonal discriminant rules have been successfully used for high-dimensional classification problems, but suffer from the serious drawback of biased discriminant scores. In this article, we propose improved diagonal discriminant rules with bias-corrected discriminant scores for high-dimensional classification. We show that the proposed discriminant scores dominate the standard ones under the quadratic loss function. Analytical results on why the bias-corrected rules can potentially improve the predication accuracy are also provided. Finally, we demonstrate the improvement of the proposed rules over the original ones through extensive simulation studies and real case studies.
Collapse
Affiliation(s)
- Song Huang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.
| | | | | |
Collapse
|
12
|
|
13
|
Konstantinopoulos PA, Spentzos D, Karlan BY, Taniguchi T, Fountzilas E, Francoeur N, Levine DA, Cannistra SA. Gene expression profile of BRCAness that correlates with responsiveness to chemotherapy and with outcome in patients with epithelial ovarian cancer. J Clin Oncol 2010; 28:3555-61. [PMID: 20547991 DOI: 10.1200/jco.2009.27.5719] [Citation(s) in RCA: 372] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
PURPOSE To define a gene expression profile of BRCAness that correlates with chemotherapy response and outcome in epithelial ovarian cancer (EOC). METHODS A publicly available microarray data set including 61 patients with EOC with either sporadic disease or BRCA(1/2) germline mutations was used for development of the BRCAness profile. Correlation with platinum responsiveness was assessed in platinum-sensitive and platinum-resistant tumor biopsy specimens from six patients with BRCA germline mutations. Association with poly-ADP ribose polymerase (PARP) inhibitor responsiveness and with radiation-induced RAD51 foci formation (a surrogate of homologous recombination) was assessed in Capan-1 cell line clones. The BRCAness profile was validated in 70 patients enriched for sporadic disease to assess its association with outcome. RESULTS The BRCAness profile accurately predicted platinum responsiveness in eight out of 10 patient-derived tumor specimens, and between PARP-inhibitor sensitivity and resistance in four out of four Capan-1 clones. [corrected] When applied to the 70 patients with sporadic disease, patients with the BRCA-like (BL) profile had improved disease-free survival (34 months v 15 months; log-rank P = .013) and overall survival (72 months v 41 months; log-rank P = .006) compared with patients with a non-BRCA-like (NBL) profile, respectively. The BRCAness profile maintained independent prognostic value in multivariate analysis, which controlled for other known clinical prognostic factors. CONCLUSION The BRCAness profile correlates with responsiveness to platinum and PARP inhibitors and identifies a subset of sporadic patients with improved outcome. Additional evaluation of this profile as a predictive tool in patients with sporadic EOC is warranted.
Collapse
|
14
|
Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J. Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics 2010; 11:277. [PMID: 20500821 PMCID: PMC2895626 DOI: 10.1186/1471-2105-11-277] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 05/25/2010] [Indexed: 02/08/2023] Open
Abstract
Background Different microarray studies have compiled gene lists for predicting outcomes of a range of treatments and diseases. These have produced gene lists that have little overlap, indicating that the results from any one study are unstable. It has been suggested that the underlying pathways are essentially identical, and that the expression of gene sets, rather than that of individual genes, may be more informative with respect to prognosis and understanding of the underlying biological process. Results We sought to examine the stability of prognostic signatures based on gene sets rather than individual genes. We classified breast cancer cases from five microarray studies according to the risk of metastasis, using features derived from predefined gene sets. The expression levels of genes in the sets are aggregated, using what we call a set statistic. The resulting prognostic gene sets were as predictive as the lists of individual genes, but displayed more consistent rankings via bootstrap replications within datasets, produced more stable classifiers across different datasets, and are potentially more interpretable in the biological context since they examine gene expression in the context of their neighbouring genes in the pathway. In addition, we performed this analysis in each breast cancer molecular subtype, based on ER/HER2 status. The prognostic gene sets found in each subtype were consistent with the biology based on previous analysis of individual genes. Conclusions To date, most analyses of gene expression data have focused at the level of the individual genes. We show that a complementary approach of examining the data using predefined gene sets can reduce the noise and could provide increased insight into the underlying biological pathways.
Collapse
Affiliation(s)
- Gad Abraham
- Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3010, VIC, Australia
| | | | | | | | | |
Collapse
|
15
|
Konstantinopoulos PA, Fountzilas E, Goldsmith JD, Bhasin M, Pillay K, Francoeur N, Libermann TA, Gebhardt MC, Spentzos D. Analysis of multiple sarcoma expression datasets: implications for classification, oncogenic pathway activation and chemotherapy resistance. PLoS One 2010; 5:e9747. [PMID: 20368975 PMCID: PMC2848563 DOI: 10.1371/journal.pone.0009747] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 01/21/2010] [Indexed: 01/13/2023] Open
Abstract
Background Diagnosis of soft tissue sarcomas (STS) is challenging. Many remain unclassified (not-otherwise-specified, NOS) or grouped in controversial categories such as malignant fibrous histiocytoma (MFH), with unclear therapeutic value. We analyzed several independent microarray datasets, to identify a predictor, use it to classify unclassifiable sarcomas, and assess oncogenic pathway activation and chemotherapy response. Methodology/Principal Findings We analyzed 5 independent datasets (325 tumor arrays). We developed and validated a predictor, which was used to reclassify MFH and NOS sarcomas. The molecular “match” between MFH and their predicted subtypes was assessed using genome-wide hierarchical clustering and Subclass-Mapping. Findings were validated in 15 paraffin samples profiled on the DASL platform. Bayesian models of oncogenic pathway activation and chemotherapy response were applied to individual STS samples. A 170-gene predictor was developed and independently validated (80-85% accuracy in all datasets). Most MFH and NOS tumors were reclassified as leiomyosarcomas, liposarcomas and fibrosarcomas. “Molecular match” between MFH and their predicted STS subtypes was confirmed both within and across datasets. This classification revealed previously unrecognized tissue differentiation lines (adipocyte, fibroblastic, smooth-muscle) and was reproduced in paraffin specimens. Different sarcoma subtypes demonstrated distinct oncogenic pathway activation patterns, and reclassified MFH tumors shared oncogenic pathway activation patterns with their predicted subtypes. These patterns were associated with predicted resistance to chemotherapeutic agents commonly used in sarcomas. Conclusions/Significance STS profiling can aid in diagnosis through a predictor tracking distinct tissue differentiation in unclassified tumors, and in therapeutic management via oncogenic pathway activation and chemotherapy response assessment.
Collapse
Affiliation(s)
- Panagiotis A. Konstantinopoulos
- Division of Hematology/Oncology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Elena Fountzilas
- Division of Hematology/Oncology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jeffrey D. Goldsmith
- Department of Pathology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Manoj Bhasin
- Genomics Center and Division of Interdisciplinary Medicine and Biotechnology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kamana Pillay
- Division of Hematology/Oncology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Nancy Francoeur
- Division of Hematology/Oncology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Towia A. Libermann
- Genomics Center and Division of Interdisciplinary Medicine and Biotechnology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mark C. Gebhardt
- Department of Orthopedic Surgery, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dimitrios Spentzos
- Division of Hematology/Oncology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
- Genomics Center and Division of Interdisciplinary Medicine and Biotechnology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
16
|
Ahdesmäki M, Strimmer K. Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann Appl Stat 2010. [DOI: 10.1214/09-aoas277] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
|
18
|
Hall P, Titterington DM, Xue JH. Tilting methods for assessing the influence of components in a classifier. J R Stat Soc Series B Stat Methodol 2009. [DOI: 10.1111/j.1467-9868.2009.00701.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|