1
|
Belhechmi S, Le Teuff G, De Bin R, Rotolo F, Michiels S. Favoring the hierarchical constraint in penalized survival models for randomized trials in precision medicine. BMC Bioinformatics 2023; 24:96. [PMID: 36927444 PMCID: PMC10022294 DOI: 10.1186/s12859-023-05162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 01/27/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the standard penalized statistical approaches. We aimed to find a compromise for high-dimensional data between the need for sparse model selection and the need for the hierarchical constraint. RESULTS To favor the property of the hierarchical interaction constraint, we proposed to create groups composed of the biomarker main effect and its interaction with treatment and to perform the bi-level selection on these groups. We proposed two weighting approaches (Single Wald (SW) and likelihood ratio test (LRT)) for the adaptive lasso method. The selection performance of these two approaches is compared to alternative lasso extensions (adaptive lasso with ridge-based weights, composite Minimax Concave Penalty, group exponential lasso and Sparse Group Lasso) through a simulation study. A RCT (NSABP B-31) randomizing 1574 patients (431 events) with early breast cancer aiming to evaluate the effect of adjuvant trastuzumab on distant-recurrence free survival with expression data from 462 genes measured in the tumour will serve for illustration. The simulation study illustrates that the adaptive lasso LRT and SW, and the group exponential lasso favored the hierarchical interaction constraint. Overall, in the alternative scenarios, they had the best balance of false discovery and false negative rates for the main effects of the selected interactions. For NSABP B-31, 12 gene-treatment interactions were identified more than 20% by the different methods. Among them, the adaptive lasso (SW) approach offered the best trade-off between a high number of selected gene-treatment interactions and a high proportion of selection of both the gene-treatment interaction and its main effect. CONCLUSIONS Adaptive lasso with Single Wald and likelihood ratio test weighting and the group exponential lasso approaches outperformed their competitors in favoring the hierarchical constraint of the biomarker-treatment interaction. However, the performance of the methods tends to decrease in the presence of prognostic biomarkers.
Collapse
Affiliation(s)
- Shaima Belhechmi
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | - Gwénaël Le Teuff
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | | | - Federico Rotolo
- Biostatistics and Data Management Unit, Innate Pharma, Marseille, France
| | - Stefan Michiels
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France. .,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France.
| |
Collapse
|
2
|
Peixoto C, Lopes MB, Martins M, Casimiro S, Sobral D, Grosso AR, Abreu C, Macedo D, Costa AL, Pais H, Alvim C, Mansinho A, Filipe P, Costa PMD, Fernandes A, Borralho P, Ferreira C, Malaquias J, Quintela A, Kaplan S, Golkaram M, Salmans M, Khan N, Vijayaraghavan R, Zhang S, Pawlowski T, Godsey J, So A, Liu L, Costa L, Vinga S. Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization. BMC Bioinformatics 2023; 24:17. [PMID: 36647008 PMCID: PMC9841719 DOI: 10.1186/s12859-022-05104-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 12/07/2022] [Indexed: 01/18/2023] Open
Abstract
Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner-a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods' accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models' predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients' groups based on RNA-seq data.
Collapse
Affiliation(s)
- Carolina Peixoto
- grid.9983.b0000 0001 2181 4263INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal
| | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), NOVA School of Science and Technology, 2829-516 Caparica, Portugal ,Center for Mathematics and Applications (NOVA MATH), NOVA School of Science and Technology (FCT NOVA), 2829-516 Caparica, Portugal
| | - Marta Martins
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Sandra Casimiro
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Daniel Sobral
- grid.10772.330000000121511713Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal ,grid.10772.330000000121511713UCIBIO - Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Ana Rita Grosso
- grid.10772.330000000121511713Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal ,grid.10772.330000000121511713UCIBIO - Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Catarina Abreu
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Daniela Macedo
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Ana Lúcia Costa
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Helena Pais
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Cecília Alvim
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - André Mansinho
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal ,grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Pedro Filipe
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Pedro Marques da Costa
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Afonso Fernandes
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Paula Borralho
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Cristina Ferreira
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - João Malaquias
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - António Quintela
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Shannon Kaplan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Mahdi Golkaram
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Michael Salmans
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Nafeesa Khan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Raakhee Vijayaraghavan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Shile Zhang
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Traci Pawlowski
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Jim Godsey
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Alex So
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Li Liu
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Luís Costa
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal ,grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Susana Vinga
- grid.9983.b0000 0001 2181 4263INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal ,grid.9983.b0000 0001 2181 4263IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1, 1049-001 Lisbon, Portugal
| |
Collapse
|
3
|
Saw SP, Ang MK, Tan DS. Adjuvant Immunotherapy in Patients with Early-Stage Non-small Cell Lung Cancer and Future Directions. Curr Treat Options Oncol 2022; 23:1721-1731. [PMID: 36451063 DOI: 10.1007/s11864-022-01034-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2022] [Indexed: 12/03/2022]
Abstract
OPINION STATEMENT While cisplatin-based adjuvant chemotherapy has been the standard of care for the past two decades, the recent introduction of immunotherapy has heralded an important milestone in the adjuvant landscape of early-stage non-small cell lung cancer (NSCLC). The landmark approval of adjuvant atezolizumab based on disease-free survival (DFS) benefit in IMpower010 was swiftly followed by the recent data for use of adjuvant pembrolizumab in PEARLS/KEYNOTE-091, and similar trials involving other immune checkpoint inhibitors are eagerly anticipated. Although both atezolizumab and pembrolizumab demonstrated a significant DFS benefit in the intention-to-treat population, key subgroup analyses have raised questions about the role of predictive biomarkers such as PD-L1 expression and EGFR-mutation status. In this review, we examine the data from the two important trials (IMpower010 and PEARLS/KEYNOTE-091), discuss the controversies surrounding adjuvant immunotherapy including appropriate endpoints, biomarker selection and highlight key considerations in oncogene-driven NSCLC. Finally, we propose future directions including the impact of neoadjuvant therapy on developments in the adjuvant immunotherapy paradigm and role of minimal residual disease (MRD).
Collapse
Affiliation(s)
- Stephanie Pl Saw
- Division of Medical Oncology, National Cancer Centre Singapore, 11 Hospital Crescent, Singapore, 169610, Singapore
| | - Mei-Kim Ang
- Division of Medical Oncology, National Cancer Centre Singapore, 11 Hospital Crescent, Singapore, 169610, Singapore
| | - Daniel Sw Tan
- Division of Medical Oncology, National Cancer Centre Singapore, 11 Hospital Crescent, Singapore, 169610, Singapore. .,SingHealth Duke-NUS Oncology Academic Clinical Programme, 11 Hospital Crescent, Singapore, 169610, Singapore.
| |
Collapse
|
4
|
Rudar J, Porter TM, Wright M, Golding GB, Hajibabaei M. LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data. BMC Bioinformatics 2022; 23:110. [PMID: 35361114 PMCID: PMC8969335 DOI: 10.1186/s12859-022-04631-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery. Results We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries. Conclusions Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04631-z.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| | - Teresita M Porter
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Michael Wright
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main St. West, Hamilton, ON, L8S 4K1, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
5
|
Belhechmi S, Bin RD, Rotolo F, Michiels S. Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models. BMC Bioinformatics 2020; 21:277. [PMID: 32615919 PMCID: PMC7331150 DOI: 10.1186/s12859-020-03618-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 06/19/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. RESULTS We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. CONCLUSIONS The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate.
Collapse
Affiliation(s)
- Shaima Belhechmi
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 Oncostat, Villejuif, F-94805, France.,Service de biostatistique et d'épidémiologie, Gustave Roussy, Villejuif, F-94805, France
| | | | - Federico Rotolo
- Biostatistics and Data Management Unit, Innate Pharma, Marseille, France
| | - Stefan Michiels
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 Oncostat, Villejuif, F-94805, France. .,Service de biostatistique et d'épidémiologie, Gustave Roussy, Villejuif, F-94805, France.
| |
Collapse
|
6
|
Abstract
Biomarker selection or feature selection from survival data is a topic of considerable interest. Recently various survival analysis approaches for biomarker selection have been developed; however, there are growing challenges to currently methods for handling high-dimensional and low-sample problem. We propose a novel Log-sum regularization estimator within accelerated failure time (AFT) for predicting cancer patient survival time with a few biomarkers. This approach is implemented in path seeking algorithm to speed up solving the Log-sum penalty. Additionally, the control parameter of Log-sum penalty is modified by Bayesian information criterion (BIC). The results indicate that our proposed approach is able to achieve good performance in both simulated and real datasets with other ℓ1 type regularization methods for biomarker selection.
Collapse
Affiliation(s)
- Sai Wang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Hui Zhang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Hua Chai
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China.,State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau, China
| |
Collapse
|
7
|
Abstract
INTRODUCTION Biomarkers are objective indications of a medical state that can be measured accurately and reproducibly. Traditional biomarkers enable diagnosis of disease through detection of disease-specific molecules, disease-mediated molecular changes, or distinct physiological or anatomical signatures. Areas covered: This work provides a framework for selecting biomarkers that are most likely to provide useful information about a patient's disease state. Though the authors emphasize markers related to disease, this work is also applicable to biomarkers for monitoring physiological changes such as ovulation or pregnancy. Additionally, the scope was restricted to biomarkers that are amenable to analytical detection across a range of health care levels, including low resource settings. The authors describe trade-offs between biomarkers' sensitivity/specificity for a disease-causing agent, the complexity of detection, and how this knowledge can be applied to the development of diagnostic tests. This report also details additional assessment criteria for successful tests. Expert commentary: Biomarker selection should primarily be driven by an attempt to answer an explicit clinical question (preferably causative relationship of the biomarker to disease-state), and only then by test development expediency (ease of detection). This framework is useful for stakeholders from test developers to clinicians to identify the trade-offs for diagnostic biomarkers for any use case.
Collapse
Affiliation(s)
- Samantha A Byrnes
- a Department of Bioengineering , University of Washington , Seattle , WA , USA.,b Intellectual Ventures Laboratory , Bellevue , WA , USA
| | | |
Collapse
|
8
|
Lu M, Zhou J, Naylor C, Kirkpatrick BD, Haque R, Petri WA, Ma JZ. Application of penalized linear regression methods to the selection of environmental enteropathy biomarkers. Biomark Res 2017; 5:9. [PMID: 28293424 PMCID: PMC5345248 DOI: 10.1186/s40364-017-0089-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 03/01/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Environmental Enteropathy (EE) is a subclinical condition caused by constant fecal-oral contamination and resulting in blunting of intestinal villi and intestinal inflammation. Of primary interest in the clinical research is to evaluate the association between non-invasive EE biomarkers and malnutrition in a cohort of Bangladeshi children. The challenges are that the number of biomarkers/covariates is relatively large, and some of them are highly correlated. METHODS Many variable selection methods are available in the literature, but which are most appropriate for EE biomarker selection remains unclear. In this study, different variable selection approaches were applied and the performance of these methods was assessed numerically through simulation studies, assuming the correlations among covariates were similar to those in the Bangladesh cohort. The suggested methods from simulations were applied to the Bangladesh cohort to select the most relevant biomarkers for the growth response, and bootstrapping methods were used to evaluate the consistency of selection results. RESULTS Through simulation studies, SCAD (Smoothly Clipped Absolute Deviation), Adaptive LASSO (Least Absolute Shrinkage and Selection Operator) and MCP (Minimax Concave Penalty) are the suggested variable selection methods, compared to traditional stepwise regression method. In the Bangladesh data, predictors such as mother weight, height-for-age z-score (HAZ) at week 18, and inflammation markers (Myeloperoxidase (MPO) at week 12 and soluable CD14 at week 18) are informative biomarkers associated with children's growth. CONCLUSIONS Penalized linear regression methods are plausible alternatives to traditional variable selection methods, and the suggested methods are applicable to other biomedical studies. The selected early-stage biomarkers offer a potential explanation for the burden of malnutrition problems in low-income countries, allow early identification of infants at risk, and suggest pathways for intervention. TRIAL REGISTRATION This study was retrospectively registered with ClinicalTrials.gov, number NCT01375647, on June 3, 2011.
Collapse
Affiliation(s)
- Miao Lu
- Department of Statistics, University of Virginia, Charlottesville, USA
| | - Jianhui Zhou
- Department of Statistics, University of Virginia, Charlottesville, USA
| | - Caitlin Naylor
- Division of Infectious Diseases, School of Medicine, University of Virginia, Charlottesville, USA
| | - Beth D. Kirkpatrick
- Department of Medicine and Vaccine Testing Center, University of Vermont College of Medicine, Burlington, USA
| | - Rashidul Haque
- The International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka, Bangladesh
| | - William A. Petri
- Division of Infectious Diseases, School of Medicine, University of Virginia, Charlottesville, USA
| | - Jennie Z. Ma
- Division of Biostatistics, Department of Public Health Sciences, University of Virginia, Charlottesville, USA
| |
Collapse
|
9
|
Arevalillo JM, Navarro H. Exploring correlations in gene expression microarray data for maximum predictive-minimum redundancy biomarker selection and classification. Comput Biol Med 2013; 43:1437-43. [PMID: 24034735 DOI: 10.1016/j.compbiomed.2013.07.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Revised: 07/02/2013] [Accepted: 07/04/2013] [Indexed: 12/27/2022]
Abstract
An important issue in the analysis of gene expression microarray data is concerned with the extraction of valuable genetic interactions from high dimensional data sets containing gene expression levels collected for a small sample of assays. Past and ongoing research efforts have been focused on biomarker selection for phenotype classification. Usually, many genes convey useless information for classifying the outcome and should be removed from the analysis; on the other hand, some of them may be highly correlated, which reveals the presence of redundant expressed information. In this paper we propose a method for the selection of highly predictive genes having a low redundancy in their expression levels. The predictive accuracy of the selection is assessed by means of Classification and Regression Trees (CART) models which enable assessment of the performance of the selected genes for classifying the outcome variable and will also uncover complex genetic interactions. The method is illustrated throughout the paper using a public domain colon cancer gene expression data set.
Collapse
|
10
|
Droog M, Beelen K, Linn S, Zwart W. Tamoxifen resistance: from bench to bedside. Eur J Pharmacol 2013; 717:47-57. [PMID: 23545365 DOI: 10.1016/j.ejphar.2012.11.071] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2012] [Revised: 11/20/2012] [Accepted: 11/23/2012] [Indexed: 01/09/2023]
Abstract
Although tamoxifen is a classical example of a targeted drug, a substantial proportion of estrogen receptor alpha positive breast cancer patients does not benefit from the drug. Over the last few decades, many potential biomarkers have been discovered in cell biological studies that may aid in the prediction of tamoxifen sensitivity and guide in treatment selection. Nonetheless, the transition of such a biomarker from the scientific community towards a diagnostic test that can be used in daily clinical practice has been far from ideal, and such markers seldom face clinical introduction. From a large number of potential predictive biomarkers as described in cell biological literature, the clinical (translational) scientist has to make a decision which of these biomarkers should be tested in clinical material to determine their clinical validity. This problem is not trivial, since patient samples with clinical follow-up are a valuable asset that should therefore be cherished. In this review, we will describe a number of 'cell biological biomarkers' for tamoxifen resistance and their possible clinical implications. This may guide the clinical scientist in choosing what potential biomarkers to test on tumour samples, which may catalyse the translation of scientific discoveries into daily clinical practice of breast cancer medicine.
Collapse
Affiliation(s)
- Marjolein Droog
- Department of Molecular Pathology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | | | | | | |
Collapse
|