1
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
2
|
Kazdaghli S, Kerenidis I, Kieckbusch J, Teare P. Improved clinical data imputation via classical and quantum determinantal point processes. eLife 2024; 12:RP89947. [PMID: 38722146 PMCID: PMC11081629 DOI: 10.7554/elife.89947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024] Open
Abstract
Imputing data is a critical issue for machine learning practitioners, including in the life sciences domain, where missing clinical data is a typical situation and the reliability of the imputation is of great importance. Currently, there is no canonical approach for imputation of clinical data and widely used algorithms introduce variance in the downstream classification. Here we propose novel imputation methods based on determinantal point processes (DPP) that enhance popular techniques such as the multivariate imputation by chained equations and MissForest. Their advantages are twofold: improving the quality of the imputed data demonstrated by increased accuracy of the downstream classification and providing deterministic and reliable imputations that remove the variance from the classification results. We experimentally demonstrate the advantages of our methods by performing extensive imputations on synthetic and real clinical data. We also perform quantum hardware experiments by applying the quantum circuits for DPP sampling since such quantum algorithms provide a computational advantage with respect to classical ones. We demonstrate competitive results with up to 10 qubits for small-scale imputation tasks on a state-of-the-art IBM quantum processor. Our classical and quantum methods improve the effectiveness and robustness of clinical data prediction modeling by providing better and more reliable data imputations. These improvements can add significant value in settings demanding high precision, such as in pharmaceutical drug trials where our approach can provide higher confidence in the predictions made.
Collapse
Affiliation(s)
| | | | - Jens Kieckbusch
- Emerging Innovations Unit, Discovery Sciences, BioPharmaceuticals R&D, AstraZenecaCambridgeUnited Kingdom
| | - Philip Teare
- Centre for AI, Data Science & AI, BioPharmaceuticals R&D, AstraZenecaCambridgeUnited Kingdom
| |
Collapse
|
3
|
Zauner R, Wimmer M, Atzmueller S, Proell J, Niklas N, Ablinger M, Reisenberger M, Lettner T, Illmer J, Dorfer S, Koller U, Guttmann-Gruber C, Hofbauer JP, Bauer JW, Wally V. Biomarker Discovery in Rare Malignancies: Development of a miRNA Signature for RDEB-cSCC. Cancers (Basel) 2023; 15:3286. [PMID: 37444397 DOI: 10.3390/cancers15133286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/05/2023] [Accepted: 06/13/2023] [Indexed: 07/15/2023] Open
Abstract
Machine learning has been proven to be a powerful tool in the identification of diagnostic tumor biomarkers but is often impeded in rare cancers due to small patient numbers. In patients suffering from recessive dystrophic epidermolysis bullosa (RDEB), early-in-life development of particularly aggressive cutaneous squamous-cell carcinomas (cSCCs) represents a major threat and timely detection is crucial to facilitate prompt tumor excision. As miRNAs have been shown to hold great potential as liquid biopsy markers, we characterized miRNA signatures derived from cultured primary cells specific for the potential detection of tumors in RDEB patients. To address the limitation in RDEB-sample accessibility, we analyzed the similarity of RDEB miRNA profiles with other tumor entities derived from the Cancer Genome Atlas (TCGA) repository. Due to the similarity in miRNA expression with RDEB-SCC, we used HN-SCC data to train a tumor prediction model. Three models with varying complexity using 33, 10 and 3 miRNAs were derived from the elastic net logistic regression model. The predictive performance of all three models was determined on an independent HN-SCC test dataset (AUC-ROC: 100%, 83% and 96%), as well as on cell-based RDEB miRNA-Seq data (AUC-ROC: 100%, 100% and 91%). In addition, the ability of the models to predict tumor samples based on RDEB exosomes (AUC-ROC: 100%, 93% and 100%) demonstrated the potential feasibility in a clinical setting. Our results support the feasibility of this approach to identify a diagnostic miRNA signature, by exploiting publicly available data and will lay the base for an improvement of early RDEB-SCC detection.
Collapse
Affiliation(s)
- Roland Zauner
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Monika Wimmer
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Sabine Atzmueller
- Center for Medical Research, Medical Faculty, Johannes-Kepler-University, 4020 Linz, Austria
| | - Johannes Proell
- Center for Medical Research, Medical Faculty, Johannes-Kepler-University, 4020 Linz, Austria
| | - Norbert Niklas
- Red Cross Transfusion Service of Upper Austria, 4020 Linz, Austria
| | - Michael Ablinger
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Manuela Reisenberger
- Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Thomas Lettner
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Julia Illmer
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Sonja Dorfer
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Ulrich Koller
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Christina Guttmann-Gruber
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Josefina Piñón Hofbauer
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Johann W Bauer
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
- Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| | - Verena Wally
- EB House Austria, Research Program for Molecular Therapy of Genodermatoses, Department of Dermatology & Allergology, University Hospital of the Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
4
|
Ju G, Yao Z, Zhao Y, Zhao X, Liu F. Data mining on identifying diagnosis and prognosis biomarkers in head and neck squamous carcinoma. Sci Rep 2023; 13:10020. [PMID: 37340028 DOI: 10.1038/s41598-023-37216-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 06/18/2023] [Indexed: 06/22/2023] Open
Abstract
Head and neck squamous carcinoma (HNSC) induces high cancer-related death worldwide. The biomarker screening on diagnosis and prognosis is of great importance. This research is aimed to explore the specific diagnostic and prognostic biomarkers for HNSC through bioinformatics analysis. The mutation and dysregulation data were acquired from UCSC Xena and TCGA databases. The top ten genes with mutation frequency in HNSC were TP53 (66%), TTN (35%), FAT1 (21%), CDKN2A (20%), MUC16 (17%), CSMD3 (16%), PIK3CA (16%), NOTCH1 (16%), SYNE1 (15%), LRP1B (14%). A total of 1,060 DEGs were identified, with 396 up-regulated and 665 downregulated in HNSC patients. Patients with lower expression of ACTN2 (P = 0.039, HR = 1.3), MYH1 (P = 0.005, HR = 1.5), MYH2 (P = 0.035, HR = 1.3), MYH7 (P = 0.053, HR = 1.3), and NEB (P = 0.0043, HR = 1.5) exhibit longer overall survival time in HNSC patients. The main DEGs were further analyzed by pan-cancer expression and immune cell infiltration analyses. MYH1, MYH2, and MYH7 were dysregulated in the cancers. Compared with HNSC, their expression levels are lower in the other types of cancers. MYH1, MYH2, and MYH7 were expected to be the specific diagnostic and prognostic molecular biomarkers of HNSC. All five DEGs have a significant positive correlation with CD4+T cells and macrophages.
Collapse
Affiliation(s)
- Guoyuan Ju
- The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Zhangyu Yao
- Department of Head and Neck Surgery, Jiangsu Cancer Hospital and Jiangsu Institute of Cancer Research and, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Yanbin Zhao
- Department of Head and Neck Surgery, Jiangsu Cancer Hospital and Jiangsu Institute of Cancer Research and, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Xiaotong Zhao
- Department of Otorhinolaryngology and Head and Neck Surgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, 221000, Jiangsu, China.
| | - Fangzhou Liu
- Department of Head and Neck Surgery, Jiangsu Cancer Hospital and Jiangsu Institute of Cancer Research and, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, 210029, China.
| |
Collapse
|
5
|
Qureshi R, Zou B, Alam T, Wu J, Lee VHF, Yan H. Computational Methods for the Analysis and Prediction of EGFR-Mutated Lung Cancer Drug Resistance: Recent Advances in Drug Design, Challenges and Future Prospects. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:238-255. [PMID: 35007197 DOI: 10.1109/tcbb.2022.3141697] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Lung cancer is a major cause of cancer deaths worldwide, and has a very low survival rate. Non-small cell lung cancer (NSCLC) is the largest subset of lung cancers, which accounts for about 85% of all cases. It has been well established that a mutation in the epidermal growth factor receptor (EGFR) can lead to lung cancer. EGFR Tyrosine Kinase Inhibitors (TKIs) are developed to target the kinase domain of EGFR. These TKIs produce promising results at the initial stage of therapy, but the efficacy becomes limited due to the development of drug resistance. In this paper, we provide a comprehensive overview of computational methods, for understanding drug resistance mechanisms. The important EGFR mutants and the different generations of EGFR-TKIs, with the survival and response rates are discussed. Next, we evaluate the role of important EGFR parameters in drug resistance mechanism, including structural dynamics, hydrogen bonds, stability, dimerization, binding free energies, and signaling pathways. Personalized drug resistance prediction models, drug response curve, drug synergy, and other data-driven methods are also discussed. Recent advancements in deep learning; such as AlphaFold2, deep generative models, big data analytics, and the applications of statistics and permutation are also highlighted. We explore limitations in the current methodologies, and discuss strategies to overcome them. We believe this review will serve as a reference for researchers; to apply computational techniques for precision medicine, analyzing structures of protein-drug complexes, drug discovery, and understanding the drug response and resistance mechanisms in lung cancer patients.
Collapse
|
6
|
Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022; 14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Collapse
|
7
|
Chiesa-Estomba CM, Graña M, Medela A, Sistiaga-Suarez JA, Lechien JR, Calvo-Henriquez C, Mayo-Yanez M, Vaira LA, Grammatica A, Cammaroto G, Ayad T, Fagan JJ. Machine Learning Algorithms as a Computer-Assisted Decision Tool for Oral Cancer Prognosis and Management Decisions: A Systematic Review. ORL J Otorhinolaryngol Relat Spec 2022; 84:278-288. [PMID: 35021182 DOI: 10.1159/000520672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/01/2021] [Indexed: 11/19/2022]
Abstract
INTRODUCTION Despite multiple prognostic indicators described for oral cavity squamous cell carcinoma (OCSCC), its management still continues to be a matter of debate. Machine learning is a subset of artificial intelligence that enables computers to learn from historical data, gather insights, and make predictions about new data using the model learned. Therefore, it can be a potential tool in the field of head and neck cancer. METHODS We conducted a systematic review. RESULTS A total of 81 manuscripts were revised, and 46 studies met the inclusion criteria. Of these, 38 were excluded for the following reasons: use of a classical statistical method (N = 16), nonspecific for OCSCC (N = 15), and not being related to OCSCC survival (N = 7). In total, 8 studies were included in the final analysis. CONCLUSIONS ML has the potential to significantly advance research in the field of OCSCC. Advantages are related to the use and training of ML models because of their capability to continue training continuously when more data become available. Future ML research will allow us to improve and democratize the application of algorithms to improve the prediction of cancer prognosis and its management worldwide.
Collapse
Affiliation(s)
- Carlos M Chiesa-Estomba
- Otorhinolaryngology - Head & Neck Surgery Department, Hospital Universitario Donostia, Biodonostia Health Research Institute, San Sebastian, Spain.,Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain
| | - Manuel Graña
- Computational Intelligence Group, Facultad de Informatica UPV/EHU, San Sebastian, Spain
| | | | - Jon A Sistiaga-Suarez
- Otorhinolaryngology - Head & Neck Surgery Department, Hospital Universitario Donostia, Biodonostia Health Research Institute, San Sebastian, Spain
| | - Jerome R Lechien
- Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain.,Department of Human Anatomy & Experimental Oncology, University of Mons, Mons, Belgium
| | - Christian Calvo-Henriquez
- Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain.,Department of Otolaryngology - Hospital Complex of Santiago de Compostela, Santiago de Compostela, Spain
| | - Miguel Mayo-Yanez
- Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain.,Otorhinolaryngology - Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), A Coruña, Spain
| | - Luigi Angelo Vaira
- Maxillofacial Surgery Unit, University Hospital of Sassari, Sassari, Italy
| | - Alberto Grammatica
- Department of Otorhinolaryngology - Head and Neck Surgery, University of Brescia, Brescia, Italy
| | - Giovanni Cammaroto
- Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain.,Department of Otolaryngology-Head & Neck Surgery, Morgagni Pierantoni Hospital, Forli, Italy
| | - Tareck Ayad
- Head & Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), San Sebastian, Spain.,Division of Otolaryngology-Head & Neck Surgery, Centre Hospitalier de l'Université de Montréal, Montreal, Québec, Canada
| | - Johannes J Fagan
- Division of Otolaryngology, Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
8
|
Scherer P, Trębacz M, Simidjievski N, Viñas R, Shams Z, Terre HA, Jamnik M, Liò P. Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases. Bioinformatics 2021; 38:1320-1327. [PMID: 34888618 PMCID: PMC8826027 DOI: 10.1093/bioinformatics/btab830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction (PPI) networks to guide the construction of predictive models. RESULTS We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Scherer
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK,To whom correspondence should be addressed.
| | - Maja Trębacz
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Nikola Simidjievski
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Ramon Viñas
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Zohreh Shams
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Helena Andres Terre
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mateja Jamnik
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| |
Collapse
|
9
|
Chi LH, Wu ATH, Hsiao M, Li YC(J. A Transcriptomic Analysis of Head and Neck Squamous Cell Carcinomas for Prognostic Indications. J Pers Med 2021; 11:782. [PMID: 34442426 PMCID: PMC8399099 DOI: 10.3390/jpm11080782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 01/27/2023] Open
Abstract
Survival analysis of the Cancer Genome Atlas (TCGA) dataset is a well-known method for discovering gene expression-based prognostic biomarkers of head and neck squamous cell carcinoma (HNSCC). A cutoff point is usually used in survival analysis for patient dichotomization when using continuous gene expression values. There is some optimization software for cutoff determination. However, the software's predetermined cutoffs are usually set at the medians or quantiles of gene expression values. There are also few clinicopathological features available in pre-processed datasets. We applied an in-house workflow, including data retrieving and pre-processing, feature selection, sliding-window cutoff selection, Kaplan-Meier survival analysis, and Cox proportional hazard modeling for biomarker discovery. In our approach for the TCGA HNSCC cohort, we scanned human protein-coding genes to find optimal cutoff values. After adjustments with confounders, clinical tumor stage and surgical margin involvement were found to be independent risk factors for prognosis. According to the results tables that show hazard ratios with Bonferroni-adjusted p values under the optimal cutoff, three biomarker candidates, CAMK2N1, CALML5, and FCGBP, are significantly associated with overall survival. We validated this discovery by using the another independent HNSCC dataset (GSE65858). Thus, we suggest that transcriptomic analysis could help with biomarker discovery. Moreover, the robustness of the biomarkers we identified should be ensured through several additional tests with independent datasets.
Collapse
Affiliation(s)
- Li-Hsing Chi
- The Ph.D. Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University and Academia Sinica, Taipei 11031, Taiwan; (L.-H.C.); (A.T.H.W.)
- Division of Oral and Maxillofacial Surgery, Department of Dentistry, Wan Fang Hospital, Taipei Medical University, Taipei 11600, Taiwan
- Division of Oral and Maxillofacial Surgery, Department of Dentistry, Taipei Medical University Hospital, Taipei Medical University, Taipei 11031, Taiwan
| | - Alexander T. H. Wu
- The Ph.D. Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University and Academia Sinica, Taipei 11031, Taiwan; (L.-H.C.); (A.T.H.W.)
| | - Michael Hsiao
- Genomics Research Center, Academia Sinica, Taipei 115024, Taiwan
- Department of Biochemistry, College of Medicine, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| | - Yu-Chuan (Jack) Li
- The Ph.D. Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University and Academia Sinica, Taipei 11031, Taiwan; (L.-H.C.); (A.T.H.W.)
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No.172-1, Sec. 2, Keelung Rd., Taipei 106339, Taiwan
| |
Collapse
|
10
|
Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021; 7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.
Collapse
Affiliation(s)
- Jose Liñares-Blanco
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
11
|
Chaudhari R, Fong LW, Tan Z, Huang B, Zhang S. An up-to-date overview of computational polypharmacology in modern drug discovery. Expert Opin Drug Discov 2020; 15:1025-1044. [PMID: 32452701 PMCID: PMC7415563 DOI: 10.1080/17460441.2020.1767063] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/06/2020] [Indexed: 12/30/2022]
Abstract
INTRODUCTION In recent years, computational polypharmacology has gained significant attention to study the promiscuous nature of drugs. Despite tremendous challenges, community-wide efforts have led to a variety of novel approaches for predicting drug polypharmacology. In particular, some rapid advances using machine learning and artificial intelligence have been reported with great success. AREAS COVERED In this article, the authors provide a comprehensive update on the current state-of-the-art polypharmacology approaches and their applications, focusing on those reports published after our 2017 review article. The authors particularly discuss some novel, groundbreaking concepts, and methods that have been developed recently and applied to drug polypharmacology studies. EXPERT OPINION Polypharmacology is evolving and novel concepts are being introduced to counter the current challenges in the field. However, major hurdles remain including incompleteness of high-quality experimental data, lack of in vitro and in vivo assays to characterize multi-targeting agents, shortage of robust computational methods, and challenges to identify the best target combinations and design effective multi-targeting agents. Fortunately, numerous national/international efforts including multi-omics and artificial intelligence initiatives as well as most recent collaborations on addressing the COVID-19 pandemic have shown significant promise to propel the field of polypharmacology forward.
Collapse
Affiliation(s)
- Rajan Chaudhari
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, United States
| | - Long Wolf Fong
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, United States
- MD Anderson UTHealth Graduate School of Biomedical Sciences, 6767 Bertner Avenue, Houston, Texas 77030, United States
| | - Zhi Tan
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, United States
| | - Beibei Huang
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, United States
| | - Shuxing Zhang
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, United States
- MD Anderson UTHealth Graduate School of Biomedical Sciences, 6767 Bertner Avenue, Houston, Texas 77030, United States
| |
Collapse
|
12
|
Kim AA, Rachid Zaim S, Subbian V. Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data. Int J Med Inform 2020; 141:104148. [DOI: 10.1016/j.ijmedinf.2020.104148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 03/22/2020] [Accepted: 04/16/2020] [Indexed: 11/28/2022]
|
13
|
de Anda-Jáuregui G, Hernández-Lemus E. Computational Oncology in the Multi-Omics Era: State of the Art. Front Oncol 2020; 10:423. [PMID: 32318338 PMCID: PMC7154096 DOI: 10.3389/fonc.2020.00423] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 03/10/2020] [Indexed: 12/24/2022] Open
Abstract
Cancer is the quintessential complex disease. As technologies evolve faster each day, we are able to quantify the different layers of biological elements that contribute to the emergence and development of malignancies. In this multi-omics context, the use of integrative approaches is mandatory in order to gain further insights on oncological phenomena, and to move forward toward the precision medicine paradigm. In this review, we will focus on computational oncology as an integrative discipline that incorporates knowledge from the mathematical, physical, and computational fields to further the biomedical understanding of cancer. We will discuss the current roles of computation in oncology in the context of multi-omic technologies, which include: data acquisition and processing; data management in the clinical and research settings; classification, diagnosis, and prognosis; and the development of models in the research setting, including their use for therapeutic target identification. We will discuss the machine learning and network approaches as two of the most promising emerging paradigms, in computational oncology. These approaches provide a foundation on how to integrate different layers of biological description into coherent frameworks that allow advances both in the basic and clinical settings.
Collapse
Affiliation(s)
- Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Cátedras Conacyt Para Jóvenes Investigadores, National Council on Science and Technology, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
14
|
Zhu W, Xie L, Han J, Guo X. The Application of Deep Learning in Cancer Prognosis Prediction. Cancers (Basel) 2020; 12:E603. [PMID: 32150991 PMCID: PMC7139576 DOI: 10.3390/cancers12030603] [Citation(s) in RCA: 120] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 02/28/2020] [Accepted: 03/02/2020] [Indexed: 12/11/2022] Open
Abstract
Deep learning has been applied to many areas in health care, including imaging diagnosis, digital pathology, prediction of hospital admission, drug design, classification of cancer and stromal cells, doctor assistance, etc. Cancer prognosis is to estimate the fate of cancer, probabilities of cancer recurrence and progression, and to provide survival estimation to the patients. The accuracy of cancer prognosis prediction will greatly benefit clinical management of cancer patients. The improvement of biomedical translational research and the application of advanced statistical analysis and machine learning methods are the driving forces to improve cancer prognosis prediction. Recent years, there is a significant increase of computational power and rapid advancement in the technology of artificial intelligence, particularly in deep learning. In addition, the cost reduction in large scale next-generation sequencing, and the availability of such data through open source databases (e.g., TCGA and GEO databases) offer us opportunities to possibly build more powerful and accurate models to predict cancer prognosis more accurately. In this review, we reviewed the most recent published works that used deep learning to build models for cancer prognosis prediction. Deep learning has been suggested to be a more generic model, requires less data engineering, and achieves more accurate prediction when working with large amounts of data. The application of deep learning in cancer prognosis has been shown to be equivalent or better than current approaches, such as Cox-PH. With the burst of multi-omics data, including genomics data, transcriptomics data and clinical information in cancer studies, we believe that deep learning would potentially improve cancer prognosis.
Collapse
Affiliation(s)
- Wan Zhu
- Department of Preventive Medicine, Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics center, School of Basic Medical Sciences, Henan University, Kaifeng 475004, China;
- Department of Anesthesia, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Longxiang Xie
- Department of Preventive Medicine, Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics center, School of Basic Medical Sciences, Henan University, Kaifeng 475004, China;
| | - Jianye Han
- Department of Computer Science, University of Illinois, Urbana Champions, IL 61820, USA;
| | - Xiangqian Guo
- Department of Preventive Medicine, Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics center, School of Basic Medical Sciences, Henan University, Kaifeng 475004, China;
| |
Collapse
|