1
|
Peixoto C, Lopes MB, Martins M, Casimiro S, Sobral D, Grosso AR, Abreu C, Macedo D, Costa AL, Pais H, Alvim C, Mansinho A, Filipe P, Costa PMD, Fernandes A, Borralho P, Ferreira C, Malaquias J, Quintela A, Kaplan S, Golkaram M, Salmans M, Khan N, Vijayaraghavan R, Zhang S, Pawlowski T, Godsey J, So A, Liu L, Costa L, Vinga S. Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization. BMC Bioinformatics 2023; 24:17. [PMID: 36647008 PMCID: PMC9841719 DOI: 10.1186/s12859-022-05104-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 12/07/2022] [Indexed: 01/18/2023] Open
Abstract
Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner-a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods' accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models' predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients' groups based on RNA-seq data.
Collapse
Affiliation(s)
- Carolina Peixoto
- grid.9983.b0000 0001 2181 4263INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal
| | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), NOVA School of Science and Technology, 2829-516 Caparica, Portugal ,Center for Mathematics and Applications (NOVA MATH), NOVA School of Science and Technology (FCT NOVA), 2829-516 Caparica, Portugal
| | - Marta Martins
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Sandra Casimiro
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Daniel Sobral
- grid.10772.330000000121511713Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal ,grid.10772.330000000121511713UCIBIO - Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Ana Rita Grosso
- grid.10772.330000000121511713Associate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal ,grid.10772.330000000121511713UCIBIO - Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Catarina Abreu
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Daniela Macedo
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Ana Lúcia Costa
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Helena Pais
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Cecília Alvim
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - André Mansinho
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal ,grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Pedro Filipe
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Pedro Marques da Costa
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Afonso Fernandes
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Paula Borralho
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal
| | - Cristina Ferreira
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - João Malaquias
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - António Quintela
- grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Shannon Kaplan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Mahdi Golkaram
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Michael Salmans
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Nafeesa Khan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Raakhee Vijayaraghavan
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Shile Zhang
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Traci Pawlowski
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Jim Godsey
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Alex So
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Li Liu
- grid.185669.50000 0004 0507 3954Illumina Inc., 5200 Illumina Way, San Diego, CA 92122 USA
| | - Luís Costa
- grid.9983.b0000 0001 2181 4263Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisbon, Portugal ,grid.418341.b0000 0004 0474 1607Oncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, Lisbon, Portugal
| | - Susana Vinga
- grid.9983.b0000 0001 2181 4263INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, 1000-029 Lisbon, Portugal ,grid.9983.b0000 0001 2181 4263IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1, 1049-001 Lisbon, Portugal
| |
Collapse
|
3
|
Lopes MB, Martins EP, Vinga S, Costa BM. The Role of Network Science in Glioblastoma. Cancers (Basel) 2021; 13:1045. [PMID: 33801334 PMCID: PMC7958335 DOI: 10.3390/cancers13051045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 12/13/2022] Open
Abstract
Network science has long been recognized as a well-established discipline across many biological domains. In the particular case of cancer genomics, network discovery is challenged by the multitude of available high-dimensional heterogeneous views of data. Glioblastoma (GBM) is an example of such a complex and heterogeneous disease that can be tackled by network science. Identifying the architecture of molecular GBM networks is essential to understanding the information flow and better informing drug development and pre-clinical studies. Here, we review network-based strategies that have been used in the study of GBM, along with the available software implementations for reproducibility and further testing on newly coming datasets. Promising results have been obtained from both bulk and single-cell GBM data, placing network discovery at the forefront of developing a molecularly-informed-based personalized medicine.
Collapse
Affiliation(s)
- Marta B. Lopes
- Center for Mathematics and Applications (CMA), FCT, UNL, 2829-516 Caparica, Portugal
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, 2829-516 Caparica, Portugal
| | - Eduarda P. Martins
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal; (E.P.M.); (B.M.C.)
- ICVS/3B’s—PT Government Associate Laboratory, 4710-057/4805-017 Braga/Guimarães, Portugal
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1000-029 Lisbon, Portugal;
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Bruno M. Costa
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal; (E.P.M.); (B.M.C.)
- ICVS/3B’s—PT Government Associate Laboratory, 4710-057/4805-017 Braga/Guimarães, Portugal
| |
Collapse
|