351
|
Bayesian variable selection in multinomial probit model for classifying high-dimensional data. Comput Stat 2014. [DOI: 10.1007/s00180-014-0540-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
352
|
Song Y, Zhu X, Lin L. Independent feature screening for ultrahigh-dimensional models with interactions. J Korean Stat Soc 2014. [DOI: 10.1016/j.jkss.2014.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
353
|
Katayama S, Kano Y. A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix. COMMUN STAT-THEOR M 2014. [DOI: 10.1080/03610926.2012.717663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
354
|
Jung SH, Chen Y, Ahn H. Type I error control for tree classification. Cancer Inform 2014; 13:11-8. [PMID: 25452689 PMCID: PMC4237155 DOI: 10.4137/cin.s16342] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 10/05/2014] [Accepted: 10/08/2014] [Indexed: 11/18/2022] Open
Abstract
Binary tree classification has been useful for classifying the whole population based on the levels of outcome variable that is associated with chosen predictors. Often we start a classification with a large number of candidate predictors, and each predictor takes a number of different cutoff values. Because of these types of multiplicity, binary tree classification method is subject to severe type I error probability. Nonetheless, there have not been many publications to address this issue. In this paper, we propose a binary tree classification method to control the probability to accept a predictor below certain level, say 5%.
Collapse
Affiliation(s)
- Sin-Ho Jung
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| | - Yong Chen
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA
| | - Hongshik Ahn
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA
| |
Collapse
|
355
|
Yin X, Hilafu H. Sequential sufficient dimension reduction for largep, smallnproblems. J R Stat Soc Series B Stat Methodol 2014. [DOI: 10.1111/rssb.12093] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
356
|
Roy Chowdhury N, Cook D, Hofmann H, Majumder M, Lee EK, Toth AL. Using visual statistical inference to better understand random class separations in high dimension, low sample size data. Comput Stat 2014. [DOI: 10.1007/s00180-014-0534-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
357
|
Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.08.032] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
358
|
Kim NY, Lee EK. Comparison of Variable Importance Measures in Tree-based Classification. KOREAN JOURNAL OF APPLIED STATISTICS 2014. [DOI: 10.5351/kjas.2014.27.5.717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
359
|
Chen HC, Zou W, Lu TP, Chen JJ. A composite model for subgroup identification and prediction via bicluster analysis. PLoS One 2014; 9:e111318. [PMID: 25347824 PMCID: PMC4210136 DOI: 10.1371/journal.pone.0111318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 09/30/2014] [Indexed: 11/18/2022] Open
Abstract
Background A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response. Methods This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms. Results The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample’s subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. Conclusion The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.
Collapse
Affiliation(s)
- Hung-Chia Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Tzu-Pin Lu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Department of Public Health, Graduate Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - James J. Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
- Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
360
|
Stetson LC, Pearl T, Chen Y, Barnholtz-Sloan JS. Computational identification of multi-omic correlates of anticancer therapeutic response. BMC Genomics 2014; 15 Suppl 7:S2. [PMID: 25573145 PMCID: PMC4243102 DOI: 10.1186/1471-2164-15-s7-s2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Background A challenge in precision medicine is the transformation of genomic data into knowledge that can be used to stratify patients into treatment groups based on predicted clinical response. Although clinical trials remain the only way to truly measure drug toxicities and effectiveness, as a scientific community we lack the resources to clinically assess all drugs presently under development. Therefore, an effective preclinical model system that enables prediction of anticancer drug response could significantly speed the broader adoption of personalized medicine. Results Three large-scale pharmacogenomic studies have screened anticancer compounds in greater than 1000 distinct human cancer cell lines. We combined these datasets to generate and validate multi-omic predictors of drug response. We compared drug response signatures built using a penalized linear regression model and two non-linear machine learning techniques, random forest and support vector machine. The precision and robustness of each drug response signature was assessed using cross-validation across three independent datasets. Fifteen drugs were common among the datasets. We validated prediction signatures for eleven out of fifteen tested drugs (17-AAG, AZD0530, AZD6244, Erlotinib, Lapatinib, Nultin-3, Paclitaxel, PD0325901, PD0332991, PF02341066, and PLX4720). Conclusions Multi-omic predictors of drug response can be generated and validated for many drugs. Specifically, the random forest algorithm generated more precise and robust prediction signatures when compared to support vector machines and the more commonly used elastic net regression. The resulting drug response signatures can be used to stratify patients into treatment groups based on their individual tumor biology, with two major benefits: speeding the process of bringing preclinical drugs to market, and the repurposing and repositioning of existing anticancer therapies.
Collapse
|
361
|
Tarca AL, Than NG, Romero R. Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.25980] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
362
|
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez J, Herrera F. A review of microarray datasets and applied feature selection methods. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2014.05.042] [Citation(s) in RCA: 386] [Impact Index Per Article: 35.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
363
|
Zheng B, Liu J, Gu J, Lu Y, Zhang W, Li M, Lu H. A three-gene panel that distinguishes benign from malignant thyroid nodules. Int J Cancer 2014; 136:1646-54. [PMID: 25175491 DOI: 10.1002/ijc.29172] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2013] [Revised: 06/27/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]
Abstract
Reliable preoperative diagnosis of malignant thyroid tumors remains challenging because of the inconclusive cytological examination of fine-needle aspiration biopsies. Although numerous studies have successfully demonstrated the use of high-throughput molecular diagnostics in cancer prediction, the application of microarrays in routine clinical use remains limited. Our aim was, therefore, to identify a small subset of genes to develop a practical and inexpensive diagnostic tool for clinical use. We developed a two-step feature selection method composed of a linear models for microarray data (LIMMA) linear model and an iterative Bayesian model averaging model to identify a suitable gene set signature. Using one public dataset for training, we discovered a three-gene signature dipeptidyl-peptidase 4 (DPP4), secretogranin V (SCG5) and carbonic anhydrase XII (CA12). We then evaluated the robustness of our gene set using three other independent public datasets. The gene signature accuracy was 85.7, 78.8 and 85.7%, respectively. For experimental validation, we collected 70 thyroid samples from surgery and our three-gene signature method achieved an accuracy of 94.3% by quantitative polymerase chain reaction (QPCR) experiment. Furthermore, immunohistochemistry in 29 samples showed proteins expressed by these three genes are also differentially expressed in thyroid samples. Our protocol discovered a robust three-gene signature that can distinguish benign from malignant thyroid tumors, which will have daily clinical application.
Collapse
Affiliation(s)
- Bing Zheng
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, China; Key Laboratory of Molecular Embryology, Ministry of Health and Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai, China; Department of Laboratory Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | | | | | | | | | | | | |
Collapse
|
364
|
Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas738] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
365
|
Himeno T, Yamada T. Estimations for some functions of covariance matrix in high dimension under non-normality and its applications. J MULTIVARIATE ANAL 2014. [DOI: 10.1016/j.jmva.2014.04.020] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
366
|
Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions. Mach Learn 2014. [DOI: 10.1007/s10994-014-5466-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
367
|
Ahmad MR, Rosen DV. Tests for high-dimensional covariance matrices using the theory of U-statistics. J STAT COMPUT SIM 2014. [DOI: 10.1080/00949655.2014.948441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
368
|
Islam AKMT, Jeong BS, Bari ATMG, Lim CG, Jeon SH. MapReduce based parallel gene selection method. APPL INTELL 2014. [DOI: 10.1007/s10489-014-0561-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
369
|
Sun ZL, Wang H, Lau WS, Seet G, Wang D, Lam KM. Microarray data classification using the spectral-feature-based TLS ensemble algorithm. IEEE Trans Nanobioscience 2014; 13:289-99. [PMID: 25014962 DOI: 10.1109/tnb.2014.2327804] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The reliable and accurate identification of cancer categories is crucial to a successful diagnosis and a proper treatment of the disease. In most existing work, samples of gene expression data are treated as one-dimensional signals, and are analyzed by means of some statistical signal processing techniques or intelligent computation algorithms. In this paper, from an image-processing viewpoint, a spectral-feature-based Tikhonov-regularized least-squares (TLS) ensemble algorithm is proposed for cancer classification using gene expression data. In the TLS model, a test sample is represented as a linear combination of the atoms of a dictionary. Two types of dictionaries, namely singular value decomposition (SVD)-based eigenassays and independent component analysis (ICA)-based eigenassays, are proposed for the TLS model, and both are extracted via a two-stage approach. The proposed algorithm is inspired by our finding that, among these eigenassays, the categories of some of the testing samples can be assigned correctly by using the TLS models formed from some of the spectral features, but not for those formed from the original samples only. In order to retain the positive characteristics of these spectral features in making correct category assignments, a strategy of classifier committee learning (CCL) is designed to combine the results obtained from the different spectral features. Experimental results on standard databases demonstrate the feasibility and effectiveness of the proposed method.
Collapse
|
370
|
Dumeaux V, Ursini-Siegel J, Flatberg A, Fjosne HE, Frantzen JO, Holmen MM, Rodegerdts E, Schlichting E, Lund E. Peripheral blood cells inform on the presence of breast cancer: a population-based case-control study. Int J Cancer 2014; 136:656-67. [PMID: 24931809 PMCID: PMC4278533 DOI: 10.1002/ijc.29030] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Revised: 04/04/2014] [Accepted: 06/02/2014] [Indexed: 01/21/2023]
Abstract
Tumor–host interactions extend beyond the local microenvironment and cancer development largely depends on the ability of malignant cells to hijack and exploit the normal physiological processes of the host. Here, we established that many genes within peripheral blood cells show differential expression when an untreated breast cancer (BC) is present, and harnessed this fact to construct a 50-gene signature that distinguish BC patients from population-based controls. Our results were derived from a series of large datasets within our unique population-based Norwegian Women and Cancer cohort that allowed us to investigate the influence of medications and tumor characteristics on our blood-based test, and were further tested in two external datasets. Our 50-gene signature contained cytostatic signals including the specific suppression of the immune response and medications influencing transcription involved in those processes were identified as confounders. Through analysis of the biological processes differentially expressed in blood, we were able to provide a rationale as to why the systemic response of the host may be a reliable marker of BC, characterized by the underexpression of both immune-specific pathways and “universal” cell programs driven by MYC (i.e., metabolism, growth and cell cycle). In conclusion, gene expression of peripheral blood cells is markedly perturbed by the specific presence of carcinoma in the breast and these changes simultaneously engage a number of systemic cytostatic signals emerging connections with immune escape of BC.
Collapse
Affiliation(s)
- Vanessa Dumeaux
- Institute of Community Medicine, University of Tromsø, Tromsø, Norway; Department of Oncology, Faculty of Medicine, McGill University, Montreal, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
371
|
Estimation of variances and covariances for high‐dimensional data: a selective review. WIRES COMPUTATIONAL STATISTICS 2014. [DOI: 10.1002/wics.1308] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
372
|
Han F, Sun W, Ling QH. A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information. PLoS One 2014; 9:e97530. [PMID: 24844313 PMCID: PMC4028211 DOI: 10.1371/journal.pone.0097530] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Accepted: 04/21/2014] [Indexed: 11/19/2022] Open
Abstract
To obtain predictive genes with lower redundancy and better interpretability, a hybrid gene selection method encoding prior information is proposed in this paper. To begin with, the prior information referred to as gene-to-class sensitivity (GCS) of all genes from microarray data is exploited by a single hidden layered feedforward neural network (SLFN). Then, to select more representative and lower redundant genes, all genes are grouped into some clusters by K-means method, and some low sensitive genes are filtered out according to their GCS values. Finally, a modified binary particle swarm optimization (BPSO) encoding the GCS information is proposed to perform further gene selection from the remainder genes. For considering the GCS information, the proposed method selects those genes highly correlated to sample classes. Thus, the low redundant gene subsets obtained by the proposed method also contribute to improve classification accuracy on microarray data. The experiments results on some open microarray data verify the effectiveness and efficiency of the proposed approach.
Collapse
Affiliation(s)
- Fei Han
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
| | - Wei Sun
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
| | - Qing-Hua Ling
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, China
| |
Collapse
|
373
|
Shaw VE, Lane B, Jenkinson C, Cox T, Greenhalf W, Halloran CM, Tang J, Sutton R, Neoptolemos JP, Costello E. Serum cytokine biomarker panels for discriminating pancreatic cancer from benign pancreatic disease. Mol Cancer 2014; 13:114. [PMID: 24884871 PMCID: PMC4032456 DOI: 10.1186/1476-4598-13-114] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 04/23/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND We investigated whether combinations of serum cytokines, used with logistic disease predictor models, could facilitate the detection of pancreatic ductal adenocarcinoma (PDAC). METHODS The serum levels of 27 cytokines were measured in 241 subjects, 127 with PDAC, 49 with chronic pancreatitis, 20 with benign biliary obstruction and 45 healthy controls. Samples were split randomly into independent training and test sets. Cytokine biomarker panels were selected by identifying the top performing cytokines in best fit logistic regression models during multiple rounds of resampling from the training dataset. Disease prediction by logistic models, built using the resulting cytokine panels, was evaluated with training and test sets and further examined using resampled performance evaluation. RESULTS For the discrimination of PDAC patients from patients with benign disease, a panel of IP-10, IL-6, PDGF plus CA19-9 offered improved diagnostic performance over CA19-9 alone in the training (AUC 0.838 vs. 0.678) and independent test set (AUC 0.884 vs. 0.798). For the discrimination of PDAC from CP, a panel of IL-8, CA19-9, IL-6 and IP-10 offered improved diagnostic performance over CA19-9 alone with the training (AUC 0.880 vs. 0.758) and test set (AUC 0.912 vs. 0.848). Finally, for the discrimination of PDAC in the presence of jaundice from benign controls with jaundice, a panel of IP-10, IL-8, IL-1b and PDGF demonstrated improvement over CA19-9 in the training (AUC 0.810 vs. 0.614) and test set (AUC 0.857 vs. 0.659). CONCLUSIONS These findings support the potential role for cytokine panels in the discrimination of PDAC from patients with benign pancreatic diseases and warrant additional study.
Collapse
MESH Headings
- Adult
- Aged
- Antigens, Tumor-Associated, Carbohydrate/blood
- Antigens, Tumor-Associated, Carbohydrate/genetics
- Biomarkers/blood
- Carcinoma, Pancreatic Ductal/blood
- Carcinoma, Pancreatic Ductal/diagnosis
- Carcinoma, Pancreatic Ductal/genetics
- Carcinoma, Pancreatic Ductal/pathology
- Case-Control Studies
- Cholestasis/blood
- Cholestasis/diagnosis
- Cholestasis/genetics
- Cholestasis/pathology
- Cytokines/blood
- Cytokines/genetics
- Diagnosis, Differential
- Female
- Gene Expression
- Humans
- Logistic Models
- Male
- Middle Aged
- Pancreas/metabolism
- Pancreas/pathology
- Pancreatic Neoplasms/blood
- Pancreatic Neoplasms/diagnosis
- Pancreatic Neoplasms/genetics
- Pancreatic Neoplasms/pathology
- Pancreatitis, Chronic/blood
- Pancreatitis, Chronic/diagnosis
- Pancreatitis, Chronic/genetics
- Pancreatitis, Chronic/pathology
- Platelet-Derived Growth Factor/genetics
- Platelet-Derived Growth Factor/metabolism
Collapse
Affiliation(s)
- Victoria E Shaw
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Brian Lane
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Claire Jenkinson
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Trevor Cox
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - William Greenhalf
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Christopher M Halloran
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Joseph Tang
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Robert Sutton
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - John P Neoptolemos
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| | - Eithne Costello
- NIHR Liverpool Pancreas Biomedical Research Unit, Royal Liverpool and Broadgreen University Hospital NHS Trust, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool L69 3GA, UK
| |
Collapse
|
374
|
Grid topologies for the self-organizing map. Neural Netw 2014; 56:35-48. [PMID: 24861385 DOI: 10.1016/j.neunet.2014.05.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 04/28/2014] [Accepted: 05/01/2014] [Indexed: 11/20/2022]
Abstract
The original Self-Organizing Feature Map (SOFM) has been extended in many ways to suit different goals and application domains. However, the topologies of the map lattice that we can found in literature are nearly always square or, more rarely, hexagonal. In this paper we study alternative grid topologies, which are derived from the geometrical theory of tessellations. Experimental results are presented for unsupervised clustering, color image segmentation and classification tasks, which show that the differences among the topologies are statistically significant in most cases, and that the optimal topology depends on the problem at hand. A theoretical interpretation of these results is also developed.
Collapse
|
375
|
Distinguishing intrinsic limit cycles from forced oscillations in ecological time series. THEOR ECOL-NETH 2014. [DOI: 10.1007/s12080-014-0225-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
376
|
Kim KJ, Cho SB. Meta-classifiers for high-dimensional, small sample classification for gene expression analysis. Pattern Anal Appl 2014. [DOI: 10.1007/s10044-014-0369-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
377
|
Jauhari S, Rizvi SAM. Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:533-547. [PMID: 26356021 DOI: 10.1109/tcbb.2014.2312002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
An understanding towards genetics and epigenetics is essential to cope up with the paradigm shift which is underway. Personalized medicine and gene therapy will confluence the days to come. This review highlights traditional approaches as well as current advancements in the analysis of the gene expression data from cancer perspective. Due to improvements in biometric instrumentation and automation, it has become easier to collect a lot of experimental data in molecular biology. Analysis of such data is extremely important as it leads to knowledge discovery that can be validated by experiments. Previously, the diagnosis of complex genetic diseases has conventionally been done based on the non-molecular characteristics like kind of tumor tissue, pathological characteristics, and clinical phase. The microarray data can be well accounted for high dimensional space and noise. Same were the reasons for ineffective and imprecise results. Several machine learning and data mining techniques are presently applied for identifying cancer using gene expression data. While differences in efficiency do exist, none of the well-established approaches is uniformly superior to others. The quality of algorithm is important, but is not in itself a guarantee of the quality of a specific data analysis.
Collapse
|
378
|
Kim SK, Roh YG, Park K, Kang TH, Kim WJ, Lee JS, Leem SH, Chu IS. Expression signature defined by FOXM1-CCNB1 activation predicts disease recurrence in non-muscle-invasive bladder cancer. Clin Cancer Res 2014; 20:3233-43. [PMID: 24714775 DOI: 10.1158/1078-0432.ccr-13-2761] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE Although standard treatment with transurethral resection and intravesical therapy (IVT) is known to be effective to address the clinical behavior of non-muscle-invasive bladder cancer (NMIBC), many patients fail to respond to the treatment and frequently experience disease recurrence. Here, we aim to identify a prognostic molecular signature that predicts the NMIBC heterogeneity and response to IVT. EXPERIMENTAL DESIGN We analyzed the genomic profiles of 102 patients with NMIBC to identify a signature associated with disease recurrence. The validity of the signature was verified in three independent patient cohorts (n = 658). Various statistical methods, including a leave-one-out cross-validation and multivariate Cox regression analyses, were applied to identify a signature. We confirmed an association between the signature and tumor aggressiveness with experimental assays using bladder cancer cell lines. RESULTS Gene expression profiling in 102 patients with NMIBC identified a CCNB1 signature associated with disease recurrence, which was validated in another three independent cohorts of 658 patients. The CCNB1 signature was shown to be an independent risk factor by a multivariate analysis and subset stratification according to stage and grade [HR, 2.93; 95% confidence intervals (CI), 1.302-6.594; P = 0.009]. The subset analysis also revealed that the signature could identify patients who would benefit from IVT. Finally, gene network analyses and experimental assays indicated that NMIBC recurrence could be mediated by FOXM1-CCNB1-Fanconi anemia pathways. CONCLUSIONS The CCNB1 signature represents a promising diagnostic tool to identify patients with NMIBC who have a high risk of recurrence and to predict response to IVT.
Collapse
Affiliation(s)
- Seon-Kyu Kim
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Yun-Gil Roh
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Kiejung Park
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Tae-Hong Kang
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Wun-Jae Kim
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Ju-Seog Lee
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Sun-Hee Leem
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - In-Sun Chu
- Authors' Affiliations: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon; Department of Biology, College of Natural Science, Dong-A University, Busan; Department of Urology, Chungbuk National University College of Medicine, Cheongju, Chungbuk, Korea; and Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
379
|
|
380
|
|
381
|
Abstract
Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.
Collapse
|
382
|
Wang C, Tong T, Cao L, Miao B. Non-parametric shrinkage mean estimation for quadratic loss functions with unknown covariance matrices. J MULTIVARIATE ANAL 2014. [DOI: 10.1016/j.jmva.2013.12.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
383
|
Hua L, Zhou P. Combining protein-protein interactions information with support vector machine to identify chronic obstructive pulmonary disease related genes. Mol Biol 2014; 48:287-296. [DOI: 10.1134/s0026893314020101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
384
|
Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY. Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:445-55. [PMID: 24108722 DOI: 10.1109/tcyb.2013.2257480] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Data sampling is a widely used technique in a broad range of machine learning problems. Traditional sampling approaches generally rely on random resampling from a given dataset. However, these approaches do not take into consideration additional information, such as sample quality and usefulness. We recently proposed a data sampling technique, called sample subset optimization (SSO). The SSO technique relies on a cross-validation procedure for identifying and selecting the most useful samples as subsets. In this paper, we describe the application of SSO techniques to imbalanced and ensemble learning problems, respectively. For imbalanced learning, the SSO technique is employed as an under-sampling technique for identifying a subset of highly discriminative samples in the majority class. In ensemble learning, the SSO technique is utilized as a generic ensemble technique where multiple optimized subsets of samples from each class are selected for building an ensemble classifier. We demonstrate the utilities and advantages of the proposed techniques on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.
Collapse
|
385
|
|
386
|
GaneshKumar P, Rani C, Devaraj D, Victoire TAA. Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:347-360. [PMID: 26355782 DOI: 10.1109/tcbb.2014.2307325] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accuracy maximization and complexity minimization are the two main goals of a fuzzy expert system based microarray data classification. Our previous Genetic Swarm Algorithm (GSA) approach has improved the classification accuracy of the fuzzy expert system at the cost of their interpretability. The if-then rules produced by the GSA are lengthy and complex which is difficult for the physician to understand. To address this interpretability-accuracy tradeoff, the rule set is represented using integer numbers and the task of rule generation is treated as a combinatorial optimization task. Ant colony optimization (ACO) with local and global pheromone updations are applied to find out the fuzzy partition based on the gene expression values for generating simpler rule set. In order to address the formless and continuous expression values of a gene, this paper employs artificial bee colony (ABC) algorithm to evolve the points of membership function. Mutual Information is used for idenfication of informative genes. The performance of the proposed hybrid Ant Bee Algorithm (ABA) is evaluated using six gene expression data sets. From the simulation study, it is found that the proposed approach generated an accurate fuzzy system with highly interpretable and compact rules for all the data sets when compared with other approaches.
Collapse
|
387
|
Aphinyanaphongs Y, Fu LD, Li Z, Peskin ER, Efstathiadis E, Aliferis CF, Statnikov A. A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization. J Assoc Inf Sci Technol 2014. [DOI: 10.1002/asi.23110] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Yindalon Aphinyanaphongs
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
- Department of Medicine; New York University School of Medicine; 550 First Avenue New York NY 10016
| | - Lawrence D. Fu
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
- Department of Medicine; New York University School of Medicine; 550 First Avenue New York NY 10016
| | - Zhiguo Li
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
| | - Eric R. Peskin
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
| | - Efstratios Efstathiadis
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
| | - Constantin F. Aliferis
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
- Department of Pathology; New York University School of Medicine; 550 First Avenue New York NY 10016
- Department of Biostatistics; Vanderbilt University; 1211 Medical Center Drive Nashville TN 37232
| | - Alexander Statnikov
- Center for Health Informatics and Bioinformatics; New York University Langone Medical Center; 227 East 30th Street New York NY 10016
- Department of Medicine; New York University School of Medicine; 550 First Avenue New York NY 10016
| |
Collapse
|
388
|
Wang H, van der Laan M. Dimension Reduction with Gene Expression Data Using Targeted Variable Importance Measurement. Bioinformatics 2014. [DOI: 10.1201/b16589-10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
389
|
Chen KH, Wang KJ, Tsai ML, Wang KM, Adrian AM, Cheng WC, Yang TS, Teng NC, Tan KP, Chang KS. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics 2014; 15:49. [PMID: 24555567 PMCID: PMC3944936 DOI: 10.1186/1471-2105-15-49] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 02/07/2014] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. RESULTS To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. CONCLUSION Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.
Collapse
Affiliation(s)
- Kun-Huang Chen
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C
| | - Kung-Jeng Wang
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C
| | - Min-Lung Tsai
- Department of Food Science, Yuanpei University, No. 306, Yuanpei Street, Hsinchu 300, Taiwan, R.O.C
| | - Kung-Min Wang
- Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, R.O.C
| | - Angelia Melani Adrian
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C
| | - Wei-Chung Cheng
- Pediatric Neurosurgery, Department of Surgery, Cheng Hsin General Hospital, Taipei 11220, Taiwan, R.O.C
- Genomic Research Center, National Yang-Ming University, Taipei 11221, Taiwan, R.O.C
| | - Tzu-Sen Yang
- School of Dental Technology, Taipei Medical University, Taipei 110, Taiwan, R.O.C
- Taiwan Research Center for Biomedical Implants and Microsurgery Devices, Taipei Medical University Taipei 110, Taiwan, R.O.C
| | - Nai-Chia Teng
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei, Taiwan, R.O.C
| | - Kuo-Pin Tan
- MBA, School of Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, R.O.C
| | - Ku-Shang Chang
- Department of Food Science, Yuanpei University, No. 306, Yuanpei Street, Hsinchu 300, Taiwan, R.O.C
| |
Collapse
|
390
|
Choi W, Porten S, Kim S, Willis D, Plimack ER, Hoffman-Censits J, Roth B, Cheng T, Tran M, Lee IL, Melquist J, Bondaruk J, Majewski T, Zhang S, Pretzsch S, Baggerly K, Siefker-Radtke A, Czerniak B, Dinney CPN, McConkey DJ. Identification of distinct basal and luminal subtypes of muscle-invasive bladder cancer with different sensitivities to frontline chemotherapy. Cancer Cell 2014; 25:152-65. [PMID: 24525232 PMCID: PMC4011497 DOI: 10.1016/j.ccr.2014.01.009] [Citation(s) in RCA: 1271] [Impact Index Per Article: 115.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 10/17/2013] [Accepted: 01/13/2014] [Indexed: 12/11/2022]
Abstract
Muscle-invasive bladder cancers (MIBCs) are biologically heterogeneous and have widely variable clinical outcomes and responses to conventional chemotherapy. We discovered three molecular subtypes of MIBC that resembled established molecular subtypes of breast cancer. Basal MIBCs shared biomarkers with basal breast cancers and were characterized by p63 activation, squamous differentiation, and more aggressive disease at presentation. Luminal MIBCs contained features of active PPARγ and estrogen receptor transcription and were enriched with activating FGFR3 mutations and potential FGFR inhibitor sensitivity. p53-like MIBCs were consistently resistant to neoadjuvant methotrexate, vinblastine, doxorubicin and cisplatin chemotherapy, and all chemoresistant tumors adopted a p53-like phenotype after therapy. Our observations have important implications for prognostication, the future clinical development of targeted agents, and disease management with conventional chemotherapy.
Collapse
MESH Headings
- Aged
- Antineoplastic Combined Chemotherapy Protocols/therapeutic use
- Biomarkers, Tumor/genetics
- Blotting, Western
- Carcinoma, Basal Cell/drug therapy
- Carcinoma, Basal Cell/pathology
- Carcinoma, Squamous Cell/drug therapy
- Carcinoma, Squamous Cell/pathology
- Cell Differentiation
- Cell Proliferation
- Cisplatin/administration & dosage
- Clinical Trials, Phase II as Topic
- Cohort Studies
- Doxorubicin/administration & dosage
- Drug Resistance, Neoplasm/genetics
- Female
- Gene Expression Profiling
- Humans
- Male
- Methotrexate/administration & dosage
- MicroRNAs/genetics
- Muscle Neoplasms/classification
- Muscle Neoplasms/drug therapy
- Muscle Neoplasms/pathology
- Mutation/genetics
- Neoadjuvant Therapy
- Neoplasm Invasiveness
- Neoplasm Staging
- PPAR gamma/genetics
- PPAR gamma/metabolism
- Prognosis
- RNA, Messenger/genetics
- Real-Time Polymerase Chain Reaction
- Receptor, Fibroblast Growth Factor, Type 3/genetics
- Receptor, Fibroblast Growth Factor, Type 3/metabolism
- Receptors, Estrogen/genetics
- Receptors, Estrogen/metabolism
- Reverse Transcriptase Polymerase Chain Reaction
- Tumor Suppressor Protein p53/genetics
- Urinary Bladder Neoplasms/classification
- Urinary Bladder Neoplasms/drug therapy
- Urinary Bladder Neoplasms/pathology
- Vinblastine/administration & dosage
Collapse
Affiliation(s)
- Woonyoung Choi
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Sima Porten
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Seungchan Kim
- Computational Biology Division, Translational Genomics Research Institute, 445N, Fifth Street, Phoenix, AZ 85004, USA
| | - Daniel Willis
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Elizabeth R Plimack
- Department of Medical Oncology, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111-2497, USA
| | - Jean Hoffman-Censits
- Department of Medical Oncology, Thomas Jefferson University Hospital, 1025 Walnut Street, Suite 700, Philadelphia, PA 19107, USA
| | - Beat Roth
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Tiewei Cheng
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; The University of Texas-Graduate School of Biomedical Sciences (GSBS) at Houston, Houston, TX 77030, USA
| | - Mai Tran
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; The University of Texas-Graduate School of Biomedical Sciences (GSBS) at Houston, Houston, TX 77030, USA
| | - I-Ling Lee
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jonathan Melquist
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jolanta Bondaruk
- Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Tadeusz Majewski
- Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Shizhen Zhang
- Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Shanna Pretzsch
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Keith Baggerly
- Department of Bioinformatics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Arlene Siefker-Radtke
- Department of Genitourinary Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bogdan Czerniak
- Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Colin P N Dinney
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - David J McConkey
- Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; The University of Texas-Graduate School of Biomedical Sciences (GSBS) at Houston, Houston, TX 77030, USA.
| |
Collapse
|
391
|
Leiva R, Roy A. Classification of Higher-order Data with Separable Covariance and Structured Multiplicative or Additive Mean Models. COMMUN STAT-THEOR M 2014. [DOI: 10.1080/03610926.2013.841931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
392
|
Ren CX, Dai DQ, Li XX, Lai ZR. Band-Reweighed Gabor Kernel Embedding for Face Image Representation and Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:725-740. [PMID: 26270914 DOI: 10.1109/tip.2013.2292560] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Face recognition with illumination or pose variation is a challenging problem in image processing and pattern recognition. A novel algorithm using band-reweighed Gabor kernel embedding to deal with the problem is proposed in this paper. For a given image, it is first transformed by a group of Gabor filters, which output Gabor features using different orientation and scale parameters. Fisher scoring function is used to measure the importance of features in each band, and then, the features with the largest scores are preserved for saving memory requirements. The reduced bands are combined by a vector, which is determined by a weighted kernel discriminant criterion and solved by a constrained quadratic programming method, and then, the weighted sum of these nonlinear bands is defined as the similarity between two images. Compared with existing concatenation-based Gabor feature representation and the uniformly weighted similarity calculation approaches, our method provides a new way to use Gabor features for face recognition and presents a reasonable interpretation for highlighting discriminant orientations and scales. The minimum Mahalanobis distance considering the spatial correlations within the data is exploited for feature matching, and the graphical lasso is used therein for directly estimating the sparse inverse covariance matrix. Experiments using benchmark databases show that our new algorithm improves the recognition results and obtains competitive performance.
Collapse
|
393
|
|
394
|
|
395
|
Applications of Bayesian gene selection and classification with mixtures of generalized singular g-priors. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2013:420412. [PMID: 24382981 PMCID: PMC3870637 DOI: 10.1155/2013/420412] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Revised: 11/10/2013] [Accepted: 11/10/2013] [Indexed: 11/17/2022]
Abstract
Recent advancement in microarray technologies has led to a collection of an enormous number of genetic markers in disease association studies, and yet scientists are interested in selecting a smaller set of genes to explore the relation between genes and disease. Current approaches either adopt a single marker test which ignores the possible interaction among genes or consider a multistage procedure that reduces the large size of genes before evaluation of the association. Among the latter, Bayesian analysis can further accommodate the correlation between genes through the specification of a multivariate prior distribution and estimate the probabilities of association through latent variables. The covariance matrix, however, depends on an unknown parameter. In this research, we suggested a reference hyperprior distribution for such uncertainty, outlined the implementation of its computation, and illustrated this fully Bayesian approach with a colon and leukemia cancer study. Comparison with other existing methods was also conducted. The classification accuracy of our proposed model is higher with a smaller set of selected genes. The results not only replicated findings in several earlier studies, but also provided the strength of association with posterior probabilities.
Collapse
|
396
|
Yang M, Li X, Li Z, Ou Z, Liu M, Liu S, Li X, Yang S. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data. PLoS One 2013; 8:e84253. [PMID: 24386356 PMCID: PMC3875537 DOI: 10.1371/journal.pone.0084253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 11/12/2013] [Indexed: 11/28/2022] Open
Abstract
Motivation DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Results Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.
Collapse
Affiliation(s)
- Mingxing Yang
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- Department of Electronic Science, School of Physics and Mechanical & Electrical Engineering, Xiamen University, Xiamen, China
| | - Xiumin Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Zhibin Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Zhimin Ou
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Ming Liu
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Suhuan Liu
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Xuejun Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- Department of Endocrinology and Diabetes, the First Affiliated Hospital of Xiamen University, Xiamen, China
- * E-mail: (SY); (Xuejun Li)
| | - Shuyu Yang
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- * E-mail: (SY); (Xuejun Li)
| |
Collapse
|
397
|
Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy. BIOMED RESEARCH INTERNATIONAL 2013; 2013:248648. [PMID: 24490149 PMCID: PMC3893734 DOI: 10.1155/2013/248648] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Accepted: 11/23/2013] [Indexed: 01/26/2023]
Abstract
Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Collapse
|
398
|
Comparison of different EHG feature selection methods for the detection of preterm labor. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:485684. [PMID: 24454536 PMCID: PMC3884970 DOI: 10.1155/2013/485684] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Revised: 10/11/2013] [Accepted: 11/04/2013] [Indexed: 12/03/2022]
Abstract
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification.
Collapse
|
399
|
Jaffe AE, Storey JD, Ji H, Leek JT. Gene set bagging for estimating the probability a statistically significant result will replicate. BMC Bioinformatics 2013; 14:360. [PMID: 24330332 PMCID: PMC3890500 DOI: 10.1186/1471-2105-14-360] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 11/11/2013] [Indexed: 11/11/2022] Open
Abstract
Background Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples. Results Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (R), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set’s p-value. Conclusions Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.
Collapse
Affiliation(s)
| | | | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD 21205, USA.
| |
Collapse
|
400
|
Xia XL, Xing H, Liu X. Analyzing kernel matrices for the identification of differentially expressed genes. PLoS One 2013; 8:e81683. [PMID: 24349110 PMCID: PMC3857896 DOI: 10.1371/journal.pone.0081683] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 10/15/2013] [Indexed: 11/21/2022] Open
Abstract
One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate.
Collapse
Affiliation(s)
- Xiao-Lei Xia
- School of Mechanical and Electrical Engineering, Jiaxing University, Jiaxing, P.R. China
- * E-mail:
| | - Huanlai Xing
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu, P.R. China
| | - Xueqin Liu
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|