1
|
Hussain I, Qureshi M, Ismail M, Iftikhar H, Zywiołek J, López-Gonzales JL. Optimal features selection in the high dimensional data based on robust technique: Application to different health database. Heliyon 2024; 10:e37241. [PMID: 39296019 PMCID: PMC11408077 DOI: 10.1016/j.heliyon.2024.e37241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 08/28/2024] [Accepted: 08/29/2024] [Indexed: 09/21/2024] Open
Abstract
Bio-informatics and gene expression analysis face major hurdles when dealing with high-dimensional data, where the number of variables or genes much outweighs the number of samples. These difficulties are exacerbated, particularly in microarray data processing, by redundant genes that do not significantly contribute to the response variable. To address this issue, gene selection emerges as a feasible method for identifying the most important genes, hence reducing the generalization error of classification algorithms. This paper introduces a new hybrid approach for gene selection by combining the Signal-to-Noise Ratio (SNR) score with the robust Mood median test. The Mood median test is beneficial for reducing the impact of outliers in non-normal or skewed data since it may successfully identify genes with significant changes across groups. The SNR score measures the significance of a gene's classification by comparing the gap between class means and within-class variability. By integrating both of these approaches, the suggested approach aims to find genes that are significant for classification tasks. The major objective of this study is to evaluate the effectiveness of this combination approach in choosing the optimal genes. A significant P-value is consistently identified for each gene using the Mood median test and the SNR score. By dividing the SNR value of each gene by its significant P-value, the Md score is calculated. Genes with a high signal-to-noise ratio (SNR) have been considered favorable due to their minimal noise influence and significant classification importance. To verify the effectiveness of the selected genes, the study utilizes two dependable classification techniques: Random Forest and K-Nearest Neighbors (KNN). These algorithms were chosen due to their track record of successfully completing categorization-related tasks. The performance of the selected genes is evaluated using two metrics: error reduction and classification accuracy. These metrics offer an in-depth assessment of how well the selected genes improve classification accuracy and consistency. According to the findings, the hybrid approach put out here outperforms conventional gene selection methods in high-dimensional datasets and has lower classification error rates. There are considerable improvements in classification accuracy and error reduction when specific genes are exposed to the Random Forest and KNN classifiers. The outcomes demonstrate how this hybrid technique might be a helpful tool to improve gene selection processes in bioinformatics.
Collapse
Affiliation(s)
- Ibrar Hussain
- Department of Statistics Abdul Wali Khan University Mardan, Pakistan
| | - Moiz Qureshi
- Govt Boys Degree College Tandojam, Hyderabad, Sindh, Pakistan
- Department of Statistics, Quaid-i-Azam University, 45320, Islamabad, Pakistan
| | - Muhammad Ismail
- College of Statistical Sciences, University of the Punjab, Lahore, Pakistan
- Department of Statistics, Quaid-i-Azam University, 45320, Islamabad, Pakistan
| | - Hasnain Iftikhar
- Department of Statistics, Quaid-i-Azam University, 45320, Islamabad, Pakistan
- Escuela de Posgrado, Universidad Peruana Unión, Lima, Peru
| | - Justyna Zywiołek
- Faculty of Management, Czestochowa University of Technology, Czestochowa, 42-200, Poland
| | | |
Collapse
|
2
|
Bitto V, Hönscheid P, Besso MJ, Sperling C, Kurth I, Baumann M, Brors B. Enhancing mass spectrometry imaging accessibility using convolutional autoencoders for deriving hypoxia-associated peptides from tumors. NPJ Syst Biol Appl 2024; 10:57. [PMID: 38802379 PMCID: PMC11130291 DOI: 10.1038/s41540-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024] Open
Abstract
Mass spectrometry imaging (MSI) allows to study cancer's intratumoral heterogeneity through spatially-resolved peptides, metabolites and lipids. Yet, in biomedical research MSI is rarely used for biomarker discovery. Besides its high dimensionality and multicollinearity, mass spectrometry (MS) technologies typically output mass-to-charge ratio values but not the biochemical compounds of interest. Our framework makes particularly low-abundant signals in MSI more accessible. We utilized convolutional autoencoders to aggregate features associated with tumor hypoxia, a parameter with significant spatial heterogeneity, in cancer xenograft models. We highlight that MSI captures these low-abundant signals and that autoencoders can preserve them in their latent space. The relevance of individual hyperparameters is demonstrated through ablation experiments, and the contribution from original features to latent features is unraveled. Complementing MSI with tandem MS from the same tumor model, multiple hypoxia-associated peptide candidates were derived. Compared to random forests alone, our autoencoder approach yielded more biologically relevant insights for biomarker discovery.
Collapse
Affiliation(s)
- Verena Bitto
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Division of Radiooncology/Radiobiology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Heidelberg, Germany.
- Faculty for Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany.
| | - Pia Hönscheid
- National Center for Tumor Diseases (NCT), Partner Site Dresden, German Cancer Research Center (DKFZ), Heidelberg, Germany
- University Hospital Carl Gustav Carus (UKD), Technische Universität Dresden, Institute of Pathology, Dresden, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - María José Besso
- Division of Radiooncology/Radiobiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christian Sperling
- National Center for Tumor Diseases (NCT), Partner Site Dresden, German Cancer Research Center (DKFZ), Heidelberg, Germany
- University Hospital Carl Gustav Carus (UKD), Technische Universität Dresden, Institute of Pathology, Dresden, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Ina Kurth
- Division of Radiooncology/Radiobiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- OncoRay - National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Helmholtz-Zentrum Dresden - Rossendorf, Dresden, Germany
- German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
| | - Michael Baumann
- Division of Radiooncology/Radiobiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- OncoRay - National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Helmholtz-Zentrum Dresden - Rossendorf, Dresden, Germany
- German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Medical Faculty Heidelberg and Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
3
|
Mohammed I, Elbashir MK, Faggad AS. Singular Value Decomposition-Based Penalized Multinomial Regression for Classifying Imbalanced Medulloblastoma Subgroups Using Methylation Data. J Comput Biol 2024; 31:458-471. [PMID: 38752890 DOI: 10.1089/cmb.2023.0198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2024] Open
Abstract
Medulloblastoma (MB) is a molecularly heterogeneous brain malignancy with large differences in clinical presentation. According to genomic studies, there are at least four distinct molecular subgroups of MB: sonic hedgehog (SHH), wingless/INT (WNT), Group 3, and Group 4. The treatment and outcomes depend on appropriate classification. It is difficult for the classification algorithms to identify these subgroups from an imbalanced MB genomic data set, where the distribution of samples among the MB subgroups may not be equal. To overcome this problem, we used singular value decomposition (SVD) and group lasso techniques to find DNA methylation probe features that maximize the separation between the different imbalanced MB subgroups. We used multinomial regression as a classification method to classify the four different molecular subgroups of MB using the reduced DNA methylation data. Coordinate descent is used to solve our loss function associated with the group lasso, which promotes sparsity. By using SVD, we were able to reduce the 321,174 probe features to just 200 features. Less than 40 features were successfully selected after applying the group lasso, which we then used as predictors for our classification models. Our proposed method achieved an average overall accuracy of 99% based on fivefold cross-validation technique. Our approach produces improved classification performance compared with the state-of-the-art methods for classifying MB molecular subgroups.
Collapse
Affiliation(s)
- Isra Mohammed
- Department of Statistics, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, Sudan
| | - Murtada K Elbashir
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
- Department of Computer Science, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, Sudan
| | - Areeg S Faggad
- Department of Molecular Biology, National Cancer Institute-University of Gezira, Wad Madani, Sudan
| |
Collapse
|
4
|
Amanzholova A, Coşkun A. Enhancing cancer stage prediction through hybrid deep neural networks: a comparative study. Front Big Data 2024; 7:1359703. [PMID: 38586474 PMCID: PMC10995364 DOI: 10.3389/fdata.2024.1359703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 02/20/2024] [Indexed: 04/09/2024] Open
Abstract
Efficiently detecting and treating cancer at an early stage is crucial to improve the overall treatment process and mitigate the risk of disease progression. In the realm of research, the utilization of artificial intelligence technologies holds significant promise for enhancing advanced cancer diagnosis. Nonetheless, a notable hurdle arises when striving for precise cancer-stage diagnoses through the analysis of gene sets. Issues such as limited sample volumes, data dispersion, overfitting, and the use of linear classifiers with simple parameters hinder prediction performance. This study introduces an innovative approach for predicting early and late-stage cancers by integrating hybrid deep neural networks. A deep neural network classifier, developed using the open-source TensorFlow library and Keras network, incorporates a novel method that combines genetic algorithms, Extreme Learning Machines (ELM), and Deep Belief Networks (DBN). Specifically, two evolutionary techniques, DBN-ELM-BP and DBN-ELM-ELM, are proposed and evaluated using data from The Cancer Genome Atlas (TCGA), encompassing mRNA expression, miRNA levels, DNA methylation, and clinical information. The models demonstrate outstanding prediction accuracy (89.35%-98.75%) in distinguishing between early- and late-stage cancers. Comparative analysis against existing methods in the literature using the same cancer dataset reveals the superiority of the proposed hybrid method, highlighting its enhanced accuracy in cancer stage prediction.
Collapse
Affiliation(s)
- Alina Amanzholova
- Graduate School of Natural and Applied Sciences, Department of Computer Engineering, Gazi University, Ankara, Türkiye
- Khoja Akhmet Yassawi International Kazakh-Turkish University, Faculty of Engineering, Department of Computer Engineering, Turkistan, Kazakhstan
| | - Aysun Coşkun
- Department of Computer Engineering, Faculty of Technology, Gazi University, Ankara, Türkiye
| |
Collapse
|
5
|
Tessmann R, Elbert R. Multi-sided platforms in competitive B2B networks with varying governmental influence - a taxonomy of Port and Cargo Community System business models. ELECTRONIC MARKETS 2022; 32:829-872. [PMID: 35602111 PMCID: PMC9040361 DOI: 10.1007/s12525-022-00529-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 02/03/2022] [Indexed: 06/15/2023]
Abstract
Our knowledge on differences in business model characteristics of thriving and failing Multi-Sided Platforms in competitive B2B networks (B2B-MSP) and potential influences of increasing governmental involvement remains fragmented. This study develops a taxonomy to classify special B2B-MSP with varying governmental influence in the supply chain and transportation context, viz. Port and Cargo Community Systems (CS). Based on the classification of 44 international CS, we identify four archetypes using cluster analysis. The taxonomy provides practitioners with a differentiated view on the configuration options of CS business models including the involvement of governmental institutions, while the presented archetypes contribute an aggregated view of CS business models. The statistical analysis of our results provides initial explanatory approaches on CS business model dimension interdependencies, thereby laying the basis for a deeper understanding of sectoral and geographic differences of B2B-MSP and their diffusion dynamics as well as facilitating a higher contextualization of future research.
Collapse
Affiliation(s)
- Ruben Tessmann
- Technical University of Darmstadt, Hochschulstraße 1, 64289 Darmstadt, Germany
| | - Ralf Elbert
- Technical University of Darmstadt, Hochschulstraße 1, 64289 Darmstadt, Germany
| |
Collapse
|
6
|
An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations. Mach Learn 2021. [DOI: 10.1007/s10994-021-06021-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractK-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages; it is only able to find local minima and the positions of the initial clustering centres (centroids) can greatly affect the clustering solution. Over the years many K-Means variations and initialisation techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations along with a range of deterministic and stochastic initialisation techniques. We show that, on average, more sophisticated initialisation techniques alleviate the need for complex clustering methods. Furthermore, deterministic methods perform better than stochastic methods. However, there is a trade-off: less sophisticated stochastic methods, executed multiple times, can result in better clustering. Factoring in execution time, deterministic methods can be competitive and result in a good clustering solution. These conclusions are obtained through extensive benchmarking using a range of synthetic model generators and real-world data sets.
Collapse
|
7
|
Özcan ŞİmŞek NÖ, ÖzgÜr A, GÜrgen F. A novel gene selection method for gene expression data for the task of cancer type classification. Biol Direct 2021; 16:7. [PMID: 33557857 PMCID: PMC7869482 DOI: 10.1186/s13062-020-00290-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 12/18/2020] [Indexed: 11/27/2022] Open
Abstract
Cancer is a poligenetic disease with each cancer type having a different mutation profile. Genomic data can be utilized to detect these profiles and to diagnose and differentiate cancer types. Variant calling provide mutation information. Gene expression data reveal the altered cell behaviour. The combination of the mutation and expression information can lead to accurate discrimination of different cancer types. In this study, we utilized and transferred the information of existing mutations for a novel gene selection method for gene expression data. We tested the proposed method in order to diagnose and differentiate cancer types. It is a disease specific method as both the mutations and expressions are filtered according to the selected cancer types. Our experiment results show that the proposed gene selection method leads to similar or improved performance metrics compared to classical feature selection methods and curated gene sets.
Collapse
Affiliation(s)
- N Özlem Özcan ŞİmŞek
- Department of Computer Engineering, Bogazici University, Bebek, İstanbul, Turkey
| | - Arzucan ÖzgÜr
- Department of Computer Engineering, Bogazici University, Bebek, İstanbul, Turkey.
| | - Fikret GÜrgen
- Department of Computer Engineering, Bogazici University, Bebek, İstanbul, Turkey.
| |
Collapse
|
8
|
Clarke R, Kraikivski P, Jones BC, Sevigny CM, Sengupta S, Wang Y. A systems biology approach to discovering pathway signaling dysregulation in metastasis. Cancer Metastasis Rev 2020; 39:903-918. [PMID: 32776157 PMCID: PMC7487029 DOI: 10.1007/s10555-020-09921-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 07/13/2020] [Indexed: 02/07/2023]
Abstract
Total metastatic burden is the primary cause of death for many cancer patients. While the process of metastasis has been studied widely, much remains to be understood. Moreover, few agents have been developed that specifically target the major steps of the metastatic cascade. Many individual genes and pathways have been implicated in metastasis but a holistic view of how these interact and cooperate to regulate and execute the process remains somewhat rudimentary. It is unclear whether all of the signaling features that regulate and execute metastasis are yet fully understood. Novel features of a complex system such as metastasis can often be discovered by taking a systems-based approach. We introduce the concepts of systems modeling and define some of the central challenges facing the application of a multidisciplinary systems-based approach to understanding metastasis and finding actionable targets therein. These challenges include appreciating the unique properties of the high-dimensional omics data often used for modeling, limitations in knowledge of the system (metastasis), tumor heterogeneity and sampling bias, and some of the issues key to understanding critical features of molecular signaling in the context of metastasis. We also provide a brief introduction to integrative modeling that focuses on both the nodes and edges of molecular signaling networks. Finally, we offer some observations on future directions as they relate to developing a systems-based model of the metastatic cascade.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA.
- Hormel Institute and Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Austin, MN, 55912, USA.
| | - Pavel Kraikivski
- Academy of Integrated Science, Division of Systems Biology, Virginia Polytechnic and State University, Blacksburg, VA, 24061, USA
| | - Brandon C Jones
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Catherine M Sevigny
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Surojeet Sengupta
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| |
Collapse
|
9
|
Gonçalves ANA, Lever M, Russo PST, Gomes-Correia B, Urbanski AH, Pollara G, Noursadeghi M, Maracaja-Coutinho V, Nakaya HI. Assessing the Impact of Sample Heterogeneity on Transcriptome Analysis of Human Diseases Using MDP Webtool. Front Genet 2019; 10:971. [PMID: 31708960 PMCID: PMC6822058 DOI: 10.3389/fgene.2019.00971] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 09/11/2019] [Indexed: 11/13/2022] Open
Abstract
Transcriptome analyses have increased our understanding of the molecular mechanisms underlying human diseases. Most approaches aim to identify significant genes by comparing their expression values between healthy subjects and a group of patients with a certain disease. Given that studies normally contain few samples, the heterogeneity among individuals caused by environmental factors or undetected illnesses can impact gene expression analyses. We present a systematic analysis of sample heterogeneity in a variety of gene expression studies relating to inflammatory and infectious diseases and show that novel immunological insights may arise once heterogeneity is addressed. The perturbation score of samples is quantified using nonperturbed subjects (i.e., healthy subjects) as a reference group. Such a score allows us to detect outlying samples and subgroups of diseased patients and even assess the molecular perturbation of single cells infected with viruses. We also show how removal of outlying samples can improve the "signal" of the disease and impact detection of differentially expressed genes. The method is made available via the mdp Bioconductor R package and as a user-friendly webtool, webMDP, available at http://mdp.sysbio.tools.
Collapse
Affiliation(s)
- André N A Gonçalves
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Melissa Lever
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Pedro S T Russo
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Bruno Gomes-Correia
- Advanced Center for Chronic Diseases-ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Alysson H Urbanski
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Gabriele Pollara
- Division of Infection and Immunity, University College London, London, United Kingdom
| | - Mahdad Noursadeghi
- Division of Infection and Immunity, University College London, London, United Kingdom
| | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases-ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Helder I Nakaya
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil.,Scientific Platform Pasteur-USP, São Paulo, Brazil
| |
Collapse
|
10
|
Yang ZY, Liu XY, Shu J, Zhang H, Ren YQ, Xu ZB, Liang Y. Multi-view based integrative analysis of gene expression data for identifying biomarkers. Sci Rep 2019; 9:13504. [PMID: 31534156 PMCID: PMC6751173 DOI: 10.1038/s41598-019-49967-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 08/30/2019] [Indexed: 01/05/2023] Open
Abstract
The widespread applications in microarray technology have produced the vast quantity of publicly available gene expression datasets. However, analysis of gene expression data using biostatistics and machine learning approaches is a challenging task due to (1) high noise; (2) small sample size with high dimensionality; (3) batch effects and (4) low reproducibility of significant biomarkers. These issues reveal the complexity of gene expression data, thus significantly obstructing microarray technology in clinical applications. The integrative analysis offers an opportunity to address these issues and provides a more comprehensive understanding of the biological systems, but current methods have several limitations. This work leverages state of the art machine learning development for multiple gene expression datasets integration, classification and identification of significant biomarkers. We design a novel integrative framework, MVIAm - Multi-View based Integrative Analysis of microarray data for identifying biomarkers. It applies multiple cross-platform normalization methods to aggregate multiple datasets into a multi-view dataset and utilizes a robust learning mechanism Multi-View Self-Paced Learning (MVSPL) for gene selection in cancer classification problems. We demonstrate the capabilities of MVIAm using simulated data and studies of breast cancer and lung cancer, it can be applied flexibly and is an effective tool for facing the four challenges of gene expression data analysis. Our proposed model makes microarray integrative analysis more systematic and expands its range of applications.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Xiao-Ying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, 519090, China
| | - Jun Shu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Hui Zhang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Yan-Qiong Ren
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Zong-Ben Xu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China.
| |
Collapse
|
11
|
Abstract
The cluster analysis has been widely applied by researchers from several scientific fields over the last decades. Advances in knowledge of biological phenomena have revived a great interest in cluster analysis due in part to the large amount of microarray data. Traditional clustering algorithms show, apart from the need of user-defined parameters, clear limitations to handle microarray data owing to its inherent characteristics: high-dimensional-low-sample-sized, highly redundant, and noisy. That has motivated the study of clustering algorithms tailored to the task of analyzing microarray data, which currently continue being developed and adapted. The present chapter is devoted to review clustering methods with different cluster analysis approaches in the challenging context of microarray data. Furthermore, the validation of the clustering results is briefly discussed by means of validity indexes used to assess the goodness of the number of clusters and the induced cluster assignments.
Collapse
Affiliation(s)
| | - Juana-María Vivo
- Department of Statistics and Operations Research, University of Murcia, Murcia, Spain.
| |
Collapse
|
12
|
Stafoggia M, Breitner S, Hampel R, Basagaña X. Statistical Approaches to Address Multi-Pollutant Mixtures and Multiple Exposures: the State of the Science. Curr Environ Health Rep 2018; 4:481-490. [PMID: 28988291 DOI: 10.1007/s40572-017-0162-z] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
PURPOSE OF REVIEW The purpose of this review is to describe the most recent statistical approaches to estimate the effect of multi-pollutant mixtures or multiple correlated exposures on human health. RECENT FINDINGS The health effects of environmental chemicals or air pollutants have been widely described. Often, there exists a complex mixture of different substances, potentially highly correlated with each other and with other (environmental) stressors. Single-exposure approaches do not allow disentangling effects of individual factors and fail to detect potential interactions between exposures. In the last years, sophisticated methods have been developed to investigate the joint or independent health effects of multi-pollutant mixtures or multiple environmental exposures. A classification of the most recent methods is proposed. A non-technical description of each method is provided, together with epidemiological applications and operational details for implementation with standard software.
Collapse
Affiliation(s)
- Massimo Stafoggia
- Department of Epidemiology, Lazio Region Health Service/ASL Roma 1, Via Cristoforo Colombo 112, 00147, Rome, Italy.
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.
| | - Susanne Breitner
- Institute of Epidemiology II, Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Neurherberg, Germany
| | - Regina Hampel
- Institute of Epidemiology II, Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Neurherberg, Germany
| | - Xavier Basagaña
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Pompeu Fabra University, Barcelona, Spain
- Ciber on Epidemiology and Public Health (CIBERESP), Madrid, Spain
| |
Collapse
|
13
|
Yu X, Yu G, Wang J. Clustering cancer gene expression data by projective clustering ensemble. PLoS One 2017; 12:e0171429. [PMID: 28234920 PMCID: PMC5325197 DOI: 10.1371/journal.pone.0171429] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 01/20/2017] [Indexed: 11/19/2022] Open
Abstract
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data.
Collapse
Affiliation(s)
- Xianxue Yu
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Beibei, Chongqing, China
| |
Collapse
|
14
|
Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures. Int J Genomics 2017; 2017:2354564. [PMID: 28265563 PMCID: PMC5317117 DOI: 10.1155/2017/2354564] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/15/2016] [Accepted: 01/04/2017] [Indexed: 11/30/2022] Open
Abstract
Background. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Results. This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance. Uniqueness: the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. Conclusions. The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified.
Collapse
|
15
|
Sánchez BN, Wu M, Song PXK, Wang W. Study design in high-dimensional classification analysis. Biostatistics 2016; 17:722-36. [PMID: 27154835 DOI: 10.1093/biostatistics/kxw018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 04/03/2016] [Indexed: 12/31/2022] Open
Abstract
Advances in high throughput technology have accelerated the use of hundreds to millions of biomarkers to construct classifiers that partition patients into different clinical conditions. Prior to classifier development in actual studies, a critical need is to determine the sample size required to reach a specified classification precision. We develop a systematic approach for sample size determination in high-dimensional (large [Formula: see text] small [Formula: see text]) classification analysis. Our method utilizes the probability of correct classification (PCC) as the optimization objective function and incorporates the higher criticism thresholding procedure for classifier development. Further, we derive the theoretical bound of maximal PCC gain from feature augmentation (e.g. when molecular and clinical predictors are combined in classifier development). Our methods are motivated and illustrated by a study using proteomics markers to classify post-kidney transplantation patients into stable and rejecting classes.
Collapse
Affiliation(s)
- Brisa N Sánchez
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Meihua Wu
- Gilead Sciences, Inc, Foster City, CA 94404, USA
| | - Peter X K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wen Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
16
|
Shalabi A, Inoue M, Watkins J, De Rinaldis E, Coolen AC. Bayesian clinical classification from high-dimensional data: Signatures versus variability. Stat Methods Med Res 2016; 27:336-351. [PMID: 26984907 DOI: 10.1177/0962280216628901] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
When data exhibit imbalance between a large number d of covariates and a small number n of samples, clinical outcome prediction is impaired by overfitting and prohibitive computation demands. Here we study two simple Bayesian prediction protocols that can be applied to data of any dimension and any number of outcome classes. Calculating Bayesian integrals and optimal hyperparameters analytically leaves only a small number of numerical integrations, and CPU demands scale as O(nd). We compare their performance on synthetic and genomic data to the mclustDA method of Fraley and Raftery. For small d they perform as well as mclustDA or better. For d = 10,000 or more mclustDA breaks down computationally, while the Bayesian methods remain efficient. This allows us to explore phenomena typical of classification in high-dimensional spaces, such as overfitting and the reduced discriminative effectiveness of signatures compared to intra-class variability.
Collapse
Affiliation(s)
- Akram Shalabi
- 1 Institute for Mathematical and Molecular Biomedicine, King's College London, London, UK
| | - Masato Inoue
- 2 Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Johnathan Watkins
- 3 Breakthrough Breast Cancer Research Unit, Department of Research Oncology, Guy's Hospital, London, UK
| | | | - Anthony Cc Coolen
- 1 Institute for Mathematical and Molecular Biomedicine, King's College London, London, UK
| |
Collapse
|
17
|
Mitchell L, Sloan TM, Mewissen M, Ghazal P, Forster T, Piotrowski M, Trew A. Parallel classification and feature selection in microarray data using SPRINT. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2014; 26:854-865. [PMID: 24883047 PMCID: PMC4038771 DOI: 10.1002/cpe.2928] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method.
Collapse
Affiliation(s)
- Lawrence Mitchell
- EPCC, School of Physics and Astronomy, University of Edinburgh, Edinburgh, EH9 3JZ, UK
| | - Terence M Sloan
- EPCC, School of Physics and Astronomy, University of Edinburgh, Edinburgh, EH9 3JZ, UK
| | - Muriel Mewissen
- Division of Pathway Medicine, University of Edinburgh, Medical School, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
| | - Peter Ghazal
- Division of Pathway Medicine, University of Edinburgh, Medical School, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
| | - Thorsten Forster
- Division of Pathway Medicine, University of Edinburgh, Medical School, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
| | - Michal Piotrowski
- EPCC, School of Physics and Astronomy, University of Edinburgh, Edinburgh, EH9 3JZ, UK
| | - Arthur Trew
- EPCC, School of Physics and Astronomy, University of Edinburgh, Edinburgh, EH9 3JZ, UK
| |
Collapse
|
18
|
Bocchini N, Giantin M, Crivellente F, Ferraresso S, Faustinelli I, Dacasto M, Cristofori P. Molecular biomarkers of phospholipidosis in rat blood and heart after amiodarone treatment. J Appl Toxicol 2014; 35:90-103. [DOI: 10.1002/jat.2992] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Nicola Bocchini
- Dipartimento di Biomedicina Comparata e Alimentazione; Università di Padova; viale dell'Università 16 I-35020 Legnaro (Padova) Italy
- Scuola di Dottorato in Scienze Veterinarie, indirizzo di Sanità pubblica e Patologia comparata; viale dell'Università 16 I-35020 Legnaro (Padova) Italy
| | - Mery Giantin
- Dipartimento di Biomedicina Comparata e Alimentazione; Università di Padova; viale dell'Università 16 I-35020 Legnaro (Padova) Italy
| | | | - Serena Ferraresso
- Dipartimento di Biomedicina Comparata e Alimentazione; Università di Padova; viale dell'Università 16 I-35020 Legnaro (Padova) Italy
| | - Ivo Faustinelli
- Preclinical Technologies; Aptuit, via Fleming 4 37135 Verona Italy
| | - Mauro Dacasto
- Dipartimento di Biomedicina Comparata e Alimentazione; Università di Padova; viale dell'Università 16 I-35020 Legnaro (Padova) Italy
| | | |
Collapse
|
19
|
Basagaña X, Barrera-Gómez J, Benet M, Antó JM, Garcia-Aymerich J. A framework for multiple imputation in cluster analysis. Am J Epidemiol 2013; 177:718-25. [PMID: 23445902 DOI: 10.1093/aje/kws289] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Multiple imputation is a common technique for dealing with missing values and is mostly applied in regression settings. Its application in cluster analysis problems, where the main objective is to classify individuals into homogenous groups, involves several difficulties which are not well characterized in the current literature. In this paper, we propose a framework for applying multiple imputation to cluster analysis when the original data contain missing values. The proposed framework incorporates the selection of the final number of clusters and a variable reduction procedure, which may be needed in data sets where the ratio of the number of persons to the number of variables is small. We suggest some ways to report how the uncertainty due to multiple imputation of missing data affects the cluster analysis outcomes-namely the final number of clusters, the results of a variable selection procedure (if applied), and the assignment of individuals to clusters. The proposed framework is illustrated with data from the Phenotype and Course of Chronic Obstructive Pulmonary Disease (PAC-COPD) Study (Spain, 2004-2008), which aimed to classify patients with chronic obstructive pulmonary disease into different disease subtypes.
Collapse
Affiliation(s)
- Xavier Basagaña
- Centre for Research in Environmental Epidemiology, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain.
| | | | | | | | | |
Collapse
|
20
|
Sánchez-Vega F, Younes L, Geman D. Learning multivariate distributions by competitive assembly of marginals. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:398-410. [PMID: 22529323 DOI: 10.1109/tpami.2012.96] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or "primitives," which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.
Collapse
Affiliation(s)
- Francisco Sánchez-Vega
- Department of Applied Mathematics and Statistics, Center for Imaging Science and Institute for Computational Medicine, Johns Hopkins University, Clark Hall, 3400 N. Charles St., Baltimore, MD 21218, USA.
| | | | | |
Collapse
|
21
|
Albanese C, Rodriguez OC, VanMeter J, Fricke ST, Rood BR, Lee Y, Wang SS, Madhavan S, Gusev Y, Petricoin EF, Wang Y. Preclinical magnetic resonance imaging and systems biology in cancer research: current applications and challenges. THE AMERICAN JOURNAL OF PATHOLOGY 2013; 182:312-8. [PMID: 23219428 PMCID: PMC3969503 DOI: 10.1016/j.ajpath.2012.09.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2012] [Revised: 09/03/2012] [Accepted: 09/18/2012] [Indexed: 01/19/2023]
Abstract
Biologically accurate mouse models of human cancer have become important tools for the study of human disease. The anatomical location of various target organs, such as brain, pancreas, and prostate, makes determination of disease status difficult. Imaging modalities, such as magnetic resonance imaging, can greatly enhance diagnosis, and longitudinal imaging of tumor progression is an important source of experimental data. Even in models where the tumors arise in areas that permit visual determination of tumorigenesis, longitudinal anatomical and functional imaging can enhance the scope of studies by facilitating the assessment of biological alterations, (such as changes in angiogenesis, metabolism, cellular invasion) as well as tissue perfusion and diffusion. One of the challenges in preclinical imaging is the development of infrastructural platforms required for integrating in vivo imaging and therapeutic response data with ex vivo pathological and molecular data using a more systems-based multiscale modeling approach. Further challenges exist in integrating these data for computational modeling to better understand the pathobiology of cancer and to better affect its cure. We review the current applications of preclinical imaging and discuss the implications of applying functional imaging to visualize cancer progression and treatment. Finally, we provide new data from an ongoing preclinical drug study demonstrating how multiscale modeling can lead to a more comprehensive understanding of cancer biology and therapy.
Collapse
Affiliation(s)
- Chris Albanese
- Lombardi Comprehensive Cancer Center and Department of Oncology, Georgetown University Medical Center, Washington, District of Columbia 20057, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Reliably assessing prediction reliability for high dimensional QSAR data. Mol Divers 2012; 17:63-73. [DOI: 10.1007/s11030-012-9415-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 12/03/2012] [Indexed: 10/27/2022]
|
23
|
Bouker KB, Wang Y, Xuan J, Clarke R. Antiestrogen Resistance and the Application of Systems Biology. ACTA ACUST UNITED AC 2012; 9:e11-e17. [PMID: 23539064 DOI: 10.1016/j.ddmec.2012.10.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Understanding the molecular changes that drive an acquired antiestrogen resistance phenotype is of major clinical relevance. Previous methodologies for addressing this question have taken a single gene/pathway approach and the resulting gains have been limited in terms of their clinical impact. Recent systems biology approaches allow for the integration of data from high throughput "-omics" technologies. We highlight recent advances in the field of antiestrogen resistance with a focus on transcriptomics, proteomics and methylomics.
Collapse
Affiliation(s)
- Kerrie B Bouker
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, Washington, DC 20057, U.S.A
| | | | | | | |
Collapse
|
24
|
Varraso R, Garcia-Aymerich J, Monier F, Le Moual N, De Batlle J, Miranda G, Pison C, Romieu I, Kauffmann F, Maccario J. Assessment of dietary patterns in nutritional epidemiology: principal component analysis compared with confirmatory factor analysis. Am J Clin Nutr 2012; 96:1079-92. [PMID: 23034967 DOI: 10.3945/ajcn.112.038109] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND In the field of nutritional epidemiology, principal component analysis (PCA) has been used to derive patterns, but the robustness of interpretation might be an issue when the sample size is small. The authors proposed the alternative use of confirmatory factor analysis (CFA) to define such patterns. OBJECTIVE The aim was to compare dietary patterns derived through PCA and CFA used as equivalent approaches in terms of stability and relevance. DESIGN PCA and CFA were performed in 2 different studies: the Epidemiological Study on the Genetics and Environment of Asthma 2-France (EGEA2-France; n = 1236) and the Phenotype and Course of Chronic Obstructive Pulmonary Disease study-Spain (n = 274). To check for stability, PCA and CFA were also performed in 2 subsamples from the EGEA2 study (n = 618 and 309). Statistical proprieties were evaluated by 1000 bootstrapped random sets of observations for each of the 4 subsamples. For each random set of observations, the distribution of the factor loading for each pattern was obtained and represented by using box-plots. To check for relevance, partial correlations between different nutrients and the different patterns derived by either PCA or CFA were calculated. RESULTS With the use of CFA, 2 consistent dietary patterns were derived in each subsample (the Prudent and the Western patterns), whereas dietary factors were less interpretable with the use of PCA (smaller median of factor loadings and higher dispersion), especially for the smallest subsample. Higher correlations were reported among total fiber, vitamins, minerals, and total lipids with patterns derived by using CFA than with patterns derived by using PCA. CONCLUSION The current study shows that CFA may be a useful alternative to PCA in epidemiologic studies, especially when the sample size is small.
Collapse
Affiliation(s)
- Raphaëlle Varraso
- INSERM U1018/CESP, Respiratory and Environmental Epidemiology team (team 5), 16 avenue Paul Vaillant Couturier, 94807 Villejuif Cedex, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
|
26
|
Kounelakis MG, Zervakis ME, Giakos GC, Postma GJ, Buydens LMC, Kotsiakis X. On the relevance of glycolysis process on brain gliomas. IEEE J Biomed Health Inform 2012; 17:128-35. [PMID: 22614725 DOI: 10.1109/titb.2012.2199128] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The proposed analysis considers aspects of both statistical and biological validation of the glycolysis effect on brain gliomas, at both genomic and metabolic level. In particular, two independent datasets are analyzed in parallel, one engaging genomic (Microarray Expression) data and the other metabolomic (Magnetic Resonance Spectroscopy Imaging) data. The aim of this study is twofold. First to show that, apart from the already studied genes (markers), other genes such as those involved in the human cell glycolysis significantly contribute in gliomas discrimination. Second, to demonstrate how the glycolysis process can open new ways towards the design of patient-specific therapeutic protocols. The results of our analysis demonstrate that the combination of genes participating in the glycolytic process (ALDOA, ALDOC, ENO2, GAPDH, HK2, LDHA, LDHB, MDH1, PDHB, PFKM, PGI, PGK1, PGM1 and PKLR) with the already known tumor suppressors (PTEN, Rb, TP53), oncogenes (CDK4, EGFR, PDGF) and HIF-1, enhance the discrimination of low versus high-grade gliomas providing high prediction ability in a cross-validated framework. Following these results and supported by the biological effect of glycolytic genes on cancer cells, we address the study of glycolysis for the development of new treatment protocols.
Collapse
|
27
|
Haining WN, Barnitz RA. Deconvolving heterogeneity in the CD8+ T-cell response to HIV. Curr Opin HIV AIDS 2012; 7:38-43. [PMID: 22156844 PMCID: PMC3291178 DOI: 10.1097/coh.0b013e32834dde1c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW This review will discuss the use of systems biology approaches to dissect the heterogeneity of the HIV-specific CD8+ T-cell response. RECENT FINDINGS New experimental approaches have allowed complex phenotypes of individual cells present in the human antigen-specific CD8+ T-cell response to be characterized. Genome-wide measurements of gene expression in antigen-specific T cells have created broad molecular phenotypes of the T-cell response to HIV. Pattern recognition algorithms to discover new subclasses of samples in microarray datasets are becoming increasingly sophisticated. Together, these advances now allow the heterogeneity of the T-cell response to HIV to be unraveled. SUMMARY The phenotype of antigen-specific T cells responding to pathogens like HIV in humans is seen as much 'noisier' than in animal models of infection. However, applying new systems biology tools may provide an opportunity to identify the sources of this 'noise' as novel, biologically distinct subclasses of the CD8+ T-cell response to HIV.
Collapse
Affiliation(s)
- W Nicholas Haining
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, USA.
| | | |
Collapse
|
28
|
Sadanandam A, Futakuchi M, Lyssiotis CA, Gibb WJ, Singh RK. A cross-species analysis of a mouse model of breast cancer-specific osteolysis and human bone metastases using gene expression profiling. BMC Cancer 2011; 11:304. [PMID: 21774828 PMCID: PMC3171728 DOI: 10.1186/1471-2407-11-304] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 07/20/2011] [Indexed: 11/14/2022] Open
Abstract
Background Breast cancer is the second leading cause of cancer-related death in women in the United States. During the advanced stages of disease, many breast cancer patients suffer from bone metastasis. These metastases are predominantly osteolytic and develop when tumor cells interact with bone. In vivo models that mimic the breast cancer-specific osteolytic bone microenvironment are limited. Previously, we developed a mouse model of tumor-bone interaction in which three mouse breast cancer cell lines were implanted onto the calvaria. Analysis of tumors from this model revealed that they exhibited strong bone resorption, induction of osteoclasts and intracranial penetration at the tumor bone (TB)-interface. Methods In this study, we identified and used a TB microenvironment-specific gene expression signature from this model to extend our understanding of the metastatic bone microenvironment in human disease and to predict potential therapeutic targets. Results We identified a TB signature consisting of 934 genes that were commonly (among our 3 cell lines) and specifically (as compared to tumor-alone area within the bone microenvironment) up- and down-regulated >2-fold at the TB interface in our mouse osteolytic model. By comparing the TB signature with gene expression profiles from human breast metastases and an in vitro osteoclast model, we demonstrate that our model mimics both the human breast cancer bone microenvironment and osteoclastogenesis. Furthermore, we observed enrichment in various signaling pathways specific to the TB interface; that is, TGF-β and myeloid self-renewal pathways were activated and the Wnt pathway was inactivated. Lastly, we used the TB-signature to predict cyclopenthiazide as a potential inhibitor of the TB interface. Conclusion Our mouse breast cancer model morphologically and genetically resembles the osteoclastic bone microenvironment observed in human disease. Characterization of the gene expression signature specific to the TB interface in our model revealed signaling mechanisms operative in human breast cancer metastases and predicted a therapeutic inhibitor of cancer-mediated osteolysis.
Collapse
Affiliation(s)
- Anguraj Sadanandam
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198-5900, USA.
| | | | | | | | | |
Collapse
|
29
|
|
30
|
Clarke R, Shajahan AN, Wang Y, Tyson JJ, Riggins RB, Weiner LM, Bauman WT, Xuan J, Zhang B, Facey C, Aiyer H, Cook K, Hickman FE, Tavassoly I, Verdugo A, Chen C, Zwart A, Wärri A, Hilakivi-Clarke LA. Endoplasmic reticulum stress, the unfolded protein response, and gene network modeling in antiestrogen resistant breast cancer. Horm Mol Biol Clin Investig 2011; 5:35-44. [PMID: 23930139 DOI: 10.1515/hmbci.2010.073] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Lack of understanding of endocrine resistance remains one of the major challenges for breast cancer researchers, clinicians, and patients. Current reductionist approaches to understanding the molecular signaling driving resistance have offered mostly incremental progress over the past 10 years. As the field of systems biology has begun to mature, the approaches and network modeling tools being developed and applied therein offer a different way to think about how molecular signaling and the regulation of critical cellular functions are integrated. To gain novel insights, we first describe some of the key challenges facing network modeling of endocrine resistance, many of which arise from the properties of the data spaces being studied. We then use activation of the unfolded protein response (UPR) following induction of endoplasmic reticulum stress in breast cancer cells by antiestrogens, to illustrate our approaches to computational modeling. Activation of UPR is a key determinant of cell fate decision making and regulation of autophagy and apoptosis. These initial studies provide insight into a small subnetwork topology obtained using differential dependency network analysis and focused on the UPR gene XBP1. The XBP1 subnetwork topology incorporates BCAR3, BCL2, BIK, NFκB, and other genes as nodes; the connecting edges represent the dependency structures amongst these nodes. As data from ongoing cellular and molecular studies become available, we will build detailed mathematical models of this XBP1-UPR network.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University School of Medicine, Washington, DC 20057, U.S.A. ; Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, Washington, DC 20057, U.S.A
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Yu G, Li H, Ha S, Shih IM, Clarke R, Hoffman EP, Madhavan S, Xuan J, Wang Y. PUGSVM: a caBIG™ analytical tool for multiclass gene selection and predictive classification. ACTA ACUST UNITED AC 2010; 27:736-8. [PMID: 21186245 DOI: 10.1093/bioinformatics/btq721] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UNLABELLED Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases. AVAILABILITY http://www.cbil.ece.vt.edu/caBIG-PUGSVM.htm.
Collapse
Affiliation(s)
- Guoqiang Yu
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Zhu L, Yang J, Song JN, Chou KC, Shen HB. Improving the accuracy of predicting disulfide connectivity by feature selection. J Comput Chem 2010; 31:1478-85. [PMID: 20127740 DOI: 10.1002/jcc.21433] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.
Collapse
Affiliation(s)
- Lin Zhu
- Department of Bioinformatics, Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China
| | | | | | | | | |
Collapse
|
33
|
Bloss CS, Schiabor KM, Schork NJ. Human behavioral informatics in genetic studies of neuropsychiatric disease: multivariate profile-based analysis. Brain Res Bull 2010; 83:177-88. [PMID: 20433907 PMCID: PMC2941546 DOI: 10.1016/j.brainresbull.2010.04.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Revised: 04/17/2010] [Accepted: 04/21/2010] [Indexed: 01/23/2023]
Abstract
While genome-wide association (GWA) studies have yielded notable findings with regard to the identification of risk variants in diseases such as obesity and diabetes, similar studies of schizophrenia - and neuropsychiatric diseases in general - have failed to produce strong findings. One, plausible explanation for this relates to phenotypic heterogeneity and what may be inherent imprecision associated with diagnostic categories in neuropsychiatric disorders. In this review we discuss a general approach to addressing the problem of heterogeneity that draws on concepts in behavioral informatics and the use of multivariable behavioral profiles in genetic studies of neuropsychiatric disease. The use of behavioral profiles as phenotypes eliminates the need for categorizing individuals with different 'subtypes' of a disease into one group and provides a way to investigate genetic susceptibility to different neuropsychiatric disorders that share similar clinical characteristics, such as schizophrenia and bipolar disorder. Further, behavioral profiles are a direct, quantitative representation of the emotional, personality, and neurocognitive functioning of the individuals being studied, and as such, the use of these profiles may provide increased statistical power to detect genetic associations and linkages. We describe and discuss four general data analysis approaches that can be used to analyze and integrate multivariate behavioral profile data and high-dimensional genomic data. Ultimately, we propose that behavioral profile-based phenotypes provide a meaningful alternative to the use of single measures, such as diagnostic category, in genetic association studies of neuropsychiatric disease.
Collapse
Affiliation(s)
- Cinnamon S. Bloss
- Scripps Genomic Medicine, Scripps Translational Science Institute, Scripps Health
| | - Kelly M. Schiabor
- Scripps Genomic Medicine, Scripps Translational Science Institute, Scripps Health
| | - Nicholas J. Schork
- Scripps Genomic Medicine, Scripps Translational Science Institute, Scripps Health
- Department of Molecular and Experimental Medicine, The Scripps Research Institute
| |
Collapse
|
34
|
Hudson LG, Gale JM, Padilla RS, Pickett G, Alexander BE, Wang J, Kusewitt DF. Microarray analysis of cutaneous squamous cell carcinomas reveals enhanced expression of epidermal differentiation complex genes. Mol Carcinog 2010; 49:619-29. [PMID: 20564339 DOI: 10.1002/mc.20636] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Gene expression profiles were determined for 12 cutaneous squamous cell carcinomas (SCC) removed from sun-exposed sites on nonimmunosuppressed patients. Gene expression in each SCC was compared to that in sun-exposed skin from the same patient using the Affymetrix HGU133 2.0 PlusGeneChip. We identified 440 genes with increased expression in SCC and 738 with decreased expression; overall we identified a large number of small changes in gene expression rather than a few marked changes that distinguished SCC from sun-exposed skin. Analyzing this robust data set according to biofunctional pathways using DAVID, transcriptional control elements using oPOSSUM, and chromosomal location using GSEA suggested genetic and epigenetic mechanisms of gene expression regulation in SCC. Some altered patterns of gene expression in SCC were consistent with regulation of spatially separated genes by a number of developmentally important transcription factors (forkhead, HMG, and homeo factors) that negatively regulated gene expression and to a few factors that positively regulated expression (Creb-1, NFkappaB, RelA, and Sp-1). We also found that coordinately enhanced expression of epidermal differentiation complex genes on chromosome 1q21 was a hallmark of SCC. A novel finding in our study was enhanced expression of keratin 13 in SCC, a result validated by immunohistochemical staining of an SCC tumor tissue array.
Collapse
Affiliation(s)
- Laurie G Hudson
- University of New Mexico College of Pharmacy, Albuquerque, New Mexico, USA
| | | | | | | | | | | | | |
Collapse
|
35
|
Buckley-James boosting for survival analysis with high-dimensional biomarker data. Stat Appl Genet Mol Biol 2010; 9:Article24. [PMID: 20597850 DOI: 10.2202/1544-6115.1550] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection simultaneously. We propose the Buckley-James boosting for the semiparametric accelerated failure time models with right censored survival data, which can be used to predict survival of future patients using the high-dimensional genomic data. In the spirit of adaptive LASSO, twin boosting is also incorporated to fit more sparse models. The proposed methods have a unified approach to fit linear models, non-linear effects models with possible interactions. The methods can perform variable selection and parameter estimation simultaneously. The proposed methods are evaluated by simulations and applied to a recent microarray gene expression data set for patients with diffuse large B-cell lymphoma under the current gold standard therapy.
Collapse
|
36
|
What should physicians look for in evaluating prognostic gene-expression signatures? Nat Rev Clin Oncol 2010; 7:327-34. [PMID: 20421890 DOI: 10.1038/nrclinonc.2010.60] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Most cancer treatments benefit only a minority of patients. This has led to a widespread interest in the identification of gene-expression-based prognostic signatures. Well-developed and validated genomic signatures can lead to personalized treatment decisions resulting in improved patient management. However, the pace of acceptance of these signatures in clinical practice has been slow. This is because many of the signatures have been developed without clear focus on the intended clinical use, and proper independent validation studies establishing their medical utility have rarely been performed. The practicing physician and the patient are thus left in doubt about the reliability and medical utility of the signatures. We aim to provide guidance to physicians in critically evaluating published studies on prognostic gene-expression signatures so that they are better equipped to decide which signatures, if any, have sufficient merit for use, in conjunction with other factors in helping their patients to make good treatment decisions. A discussion of the lessons to be learned from the successful development of the Oncotype DX genetic test for breast cancer is presented and contrasted with a review of the current status of prognostic gene-expression signatures in non-small-cell lung cancer.
Collapse
|
37
|
Jamieson AR, Giger ML, Drukker K, Li H, Yuan Y, Bhooshan N. Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE. Med Phys 2010; 37:339-51. [PMID: 20175497 DOI: 10.1118/1.3267037] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. METHODS These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. RESULTS In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. CONCLUSIONS Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Collapse
Affiliation(s)
- Andrew R Jamieson
- Department of Radiology, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | | | | | |
Collapse
|
38
|
Day A, Dong J, Funari VA, Harry B, Strom SP, Cohn DH, Nelson SF. Disease gene characterization through large-scale co-expression analysis. PLoS One 2009; 4:e8491. [PMID: 20046828 PMCID: PMC2797297 DOI: 10.1371/journal.pone.0008491] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 12/07/2009] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET). RESULTS Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2) and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders. DISCUSSION UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis.
Collapse
Affiliation(s)
- Allen Day
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jun Dong
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Vincent A. Funari
- Cedars-Sinai Medical Center, Medical Genetics Institute, Los Angeles, California, United States of America
- Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Bret Harry
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Samuel P. Strom
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Dan H. Cohn
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Cedars-Sinai Medical Center, Medical Genetics Institute, Los Angeles, California, United States of America
- Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Stanley F. Nelson
- Department of Human Genetics, Molecular Biology Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Psychiatry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
39
|
Clarke R, Shajahan AN, Riggins RB, Cho Y, Crawford A, Xuan J, Wang Y, Zwart A, Nehra R, Liu MC. Gene network signaling in hormone responsiveness modifies apoptosis and autophagy in breast cancer cells. J Steroid Biochem Mol Biol 2009; 114:8-20. [PMID: 19444933 PMCID: PMC2768542 DOI: 10.1016/j.jsbmb.2008.12.023] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Resistance to endocrine therapies, whether de novo or acquired, remains a major limitation in the ability to cure many tumors that express detectable levels of the estrogen receptor alpha protein (ER). While several resistance phenotypes have been described, endocrine unresponsiveness in the context of therapy-induced tumor growth appears to be the most prevalent. The signaling that regulates endocrine resistant phenotypes is poorly understood but it involves a complex signaling network with a topology that includes redundant and degenerative features. To be relevant to clinical outcomes, the most pertinent features of this network are those that ultimately affect the endocrine-regulated components of the cell fate and cell proliferation machineries. We show that autophagy, as supported by the endocrine regulation of monodansylcadaverine staining, increased LC3 cleavage, and reduced expression of p62/SQSTM1, plays an important role in breast cancer cells responding to endocrine therapy. We further show that the cell fate machinery includes both apoptotic and autophagic functions that are potentially regulated through integrated signaling that flows through key members of the BCL2 gene family and beclin-1 (BECN1). This signaling links cellular functions in mitochondria and endoplasmic reticulum, the latter as a consequence of induction of the unfolded protein response. We have taken a seed-gene approach to begin extracting critical nodes and edges that represent central signaling events in the endocrine regulation of apoptosis and autophagy. Three seed nodes were identified from global gene or protein expression analyses and supported by subsequent functional studies that established their abilities to affect cell fate. The seed nodes of nuclear factor kappa B (NFkappaB), interferon regulatory factor-1 (IRF1), and X-box binding protein-1 (XBP1)are linked by directional edges that support signal flow through a preliminary network that is grown to include key regulators of their individual function: NEMO/IKKgamma, nucleophosmin and ER respectively. Signaling proceeds through BCL2 gene family members and BECN1 ultimately to regulate cell fate.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, Washington, DC 20057, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Warrick JW, Murphy WL, Beebe DJ. Screening the cellular microenvironment: a role for microfluidics. IEEE Rev Biomed Eng 2008; 1:75-93. [PMID: 20190880 DOI: 10.1109/rbme.2008.2008241] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The cellular microenvironment is an increasingly discussed topic in cell biology as it has been implicated in the progression of cancer and the maintenance of stem cells. The microenvironment of a cell is an organized combination of extracellular matrix (ECM), cells, and interstitial fluid that influence cellular phenotype through physical, mechanical, and biochemical mechanisms. Screening can be used to map combinations of cells and microenvironments to phenotypic outcomes in a way that can help develop more predictive in vitro models and to better understand phenotypic mechanisms from a systems biology perspective. This paper examines microenvironmental screening in terms of outcomes and benefits, key elements of the screening process, challenges for implementation, and a possible role for microfluidics as the screening platform. To assess microfluidics for use in microenvironmental screening, examples and categories of micro-scale and microfluidic technology are highlighted. Microfluidic technology shows promise for simultaneous control of multiple parameters of the microenvironment and can provide a base for scaling advanced cell-based experiments into automated high-throughput formats.
Collapse
Affiliation(s)
- Jay W Warrick
- Department of Biomedical Engineering, University of Wisconsin, Madison, WI 53706-1609, USA
| | | | | |
Collapse
|
41
|
Classen S, Staratschek-Jox A, Schultze JL. Use of genome-wide high-throughput technologies in biomarker development. Biomark Med 2008; 2:509-24. [DOI: 10.2217/17520363.2.5.509] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
In recent years, the usage of high-throughput technologies in the fields of genomics, transcriptomics, proteomics and metabolomics for biomarker discovery has expanded enormously. Biomarkers can be applied for many purposes, including diagnosis, prognosis, staging and selecting appropriate patient therapy. In addition, biomarkers can provide information on disease mechanism or progression. Biomarker development for clinical application encompasses phases for their discovery and characterization, assay development and, finally, implementation using automated platforms employed in clinical laboratories. However, translation from bench to bedside outside a research-oriented environment has proven to be more difficult. This is reflected by only few new biomarkers being integrated into clinical application in the last years. This article reviews currently used high-throughput technologies for the identification of biomarkers, as well as present approaches to increase the percentage of biomarkers that pass the barriers for clinical application.
Collapse
Affiliation(s)
- Sabine Classen
- Molecular Immune & Cell Biology, Laboratory for Genomics & Immunoregulation, LIMES (Life and Medical Sciences) Bonn Program Unit, University of Bonn Karlrobert-Kreitenstraat 13,D-53115, Bonn, Germany
| | - Andrea Staratschek-Jox
- Molecular Immune & Cell Biology, Laboratory for Genomics & Immunoregulation, LIMES (Life and Medical Sciences) Bonn Program Unit, University of Bonn Karlrobert-Kreitenstraat 13,D-53115, Bonn, Germany
| | - Joachim L Schultze
- Molecular Immune & Cell Biology, Laboratory for Genomics & Immunoregulation, LIMES (Life and Medical Sciences) Bonn Program Unit, University of Bonn Karlrobert-Kreitenstraat 13,D-53115, Bonn, Germany
| |
Collapse
|
42
|
Zhu Y, Li H, Miller DJ, Wang Z, Xuan J, Clarke R, Hoffman EP, Wang Y. caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data. BMC Bioinformatics 2008; 9:383. [PMID: 18801195 PMCID: PMC2566986 DOI: 10.1186/1471-2105-9-383] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2008] [Accepted: 09/18/2008] [Indexed: 12/31/2022] Open
Abstract
Background The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering (wherein phenotype labels for samples are known), albeit with minor algorithm modifications customized to each of these tasks. Conclusion VISDA achieved robust and superior clustering accuracy, compared with several benchmark clustering schemes. The model order selection scheme in VISDA was shown to be effective for high dimensional genomic data clustering. On muscular dystrophy data and muscle regeneration data, VISDA identified biologically relevant co-expressed gene clusters. VISDA also captured the pathological relationships among different phenotypes revealed at the molecular level, through phenotype clustering on muscular dystrophy data and multi-category cancer data.
Collapse
Affiliation(s)
- Yitan Zhu
- Department of Electrical and Computer Engineering, Virginia Polytechnic and State University, Arlington, VA 22203, USA.
| | | | | | | | | | | | | | | |
Collapse
|